2008-07-31 03:06:12 +08:00
|
|
|
/*
|
2015-03-18 17:46:04 +08:00
|
|
|
* Copyright © 2008-2015 Intel Corporation
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Eric Anholt <eric@anholt.net>
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2013-07-25 03:07:52 +08:00
|
|
|
#include <drm/drm_vma_manager.h>
|
2012-10-03 01:01:07 +08:00
|
|
|
#include <drm/i915_drm.h>
|
2016-11-15 04:41:05 +08:00
|
|
|
#include <linux/dma-fence-array.h>
|
2017-02-13 01:20:01 +08:00
|
|
|
#include <linux/kthread.h>
|
2019-08-11 16:06:32 +08:00
|
|
|
#include <linux/dma-resv.h>
|
2011-06-28 07:18:18 +08:00
|
|
|
#include <linux/shmem_fs.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2016-11-22 22:41:21 +08:00
|
|
|
#include <linux/stop_machine.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
#include <linux/swap.h>
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 06:24:08 +08:00
|
|
|
#include <linux/pci.h>
|
2012-05-10 21:25:09 +08:00
|
|
|
#include <linux/dma-buf.h>
|
2019-01-18 05:03:34 +08:00
|
|
|
#include <linux/mman.h>
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-06-13 16:44:16 +08:00
|
|
|
#include "display/intel_display.h"
|
|
|
|
#include "display/intel_frontbuffer.h"
|
|
|
|
|
2019-05-28 17:29:49 +08:00
|
|
|
#include "gem/i915_gem_clflush.h"
|
|
|
|
#include "gem/i915_gem_context.h"
|
2019-05-28 17:29:43 +08:00
|
|
|
#include "gem/i915_gem_ioctls.h"
|
2019-05-28 17:29:49 +08:00
|
|
|
#include "gem/i915_gem_pm.h"
|
2019-08-06 20:43:00 +08:00
|
|
|
#include "gt/intel_engine_user.h"
|
2019-06-21 15:08:02 +08:00
|
|
|
#include "gt/intel_gt.h"
|
drm/i915: Invert the GEM wakeref hierarchy
In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)
Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.
Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
2019-04-25 04:07:17 +08:00
|
|
|
#include "gt/intel_gt_pm.h"
|
2019-04-25 01:48:39 +08:00
|
|
|
#include "gt/intel_mocs.h"
|
|
|
|
#include "gt/intel_reset.h"
|
2019-07-29 19:37:20 +08:00
|
|
|
#include "gt/intel_renderstate.h"
|
2019-04-25 01:48:39 +08:00
|
|
|
#include "gt/intel_workarounds.h"
|
|
|
|
|
2019-01-16 23:33:04 +08:00
|
|
|
#include "i915_drv.h"
|
2019-05-28 17:29:50 +08:00
|
|
|
#include "i915_scatterlist.h"
|
2019-01-16 23:33:04 +08:00
|
|
|
#include "i915_trace.h"
|
|
|
|
#include "i915_vgpu.h"
|
|
|
|
|
2019-04-05 19:00:15 +08:00
|
|
|
#include "intel_pm.h"
|
2019-01-16 23:33:04 +08:00
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
static int
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
insert_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node, u32 size)
|
2016-06-10 16:53:01 +08:00
|
|
|
{
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = mutex_lock_interruptible(&ggtt->vm.mutex);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
memset(node, 0, sizeof(*node));
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
err = drm_mm_insert_node_in_range(&ggtt->vm.mm, node,
|
|
|
|
size, 0, I915_COLOR_UNEVICTABLE,
|
|
|
|
0, ggtt->mappable_end,
|
|
|
|
DRM_MM_INSERT_LOW);
|
|
|
|
|
|
|
|
mutex_unlock(&ggtt->vm.mutex);
|
|
|
|
|
|
|
|
return err;
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
remove_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node)
|
2016-06-10 16:53:01 +08:00
|
|
|
{
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
mutex_lock(&ggtt->vm.mutex);
|
2016-06-10 16:53:01 +08:00
|
|
|
drm_mm_remove_node(node);
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
mutex_unlock(&ggtt->vm.mutex);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
int
|
|
|
|
i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-10-23 12:40:13 +08:00
|
|
|
{
|
2019-01-28 18:23:53 +08:00
|
|
|
struct i915_ggtt *ggtt = &to_i915(dev)->ggtt;
|
2016-03-30 21:57:10 +08:00
|
|
|
struct drm_i915_gem_get_aperture *args = data;
|
2015-07-01 18:51:10 +08:00
|
|
|
struct i915_vma *vma;
|
2017-05-31 10:35:52 +08:00
|
|
|
u64 pinned;
|
2008-10-23 12:40:13 +08:00
|
|
|
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
if (mutex_lock_interruptible(&ggtt->vm.mutex))
|
|
|
|
return -EINTR;
|
2019-01-28 18:23:53 +08:00
|
|
|
|
2018-06-05 23:37:58 +08:00
|
|
|
pinned = ggtt->vm.reserved;
|
2019-01-28 18:23:52 +08:00
|
|
|
list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
|
2016-08-04 23:32:30 +08:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 18:51:10 +08:00
|
|
|
pinned += vma->node.size;
|
2019-01-28 18:23:53 +08:00
|
|
|
|
|
|
|
mutex_unlock(&ggtt->vm.mutex);
|
2008-10-23 12:40:13 +08:00
|
|
|
|
2018-06-05 23:37:58 +08:00
|
|
|
args->aper_size = ggtt->vm.total;
|
2011-08-17 03:34:10 +08:00
|
|
|
args->aper_available_size = args->aper_size - pinned;
|
2010-11-24 20:23:44 +08:00
|
|
|
|
2008-10-23 12:40:13 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-07-03 17:17:17 +08:00
|
|
|
int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned long flags)
|
2016-08-04 14:52:27 +08:00
|
|
|
{
|
|
|
|
struct i915_vma *vma;
|
|
|
|
LIST_HEAD(still_in_list);
|
2019-05-28 17:29:51 +08:00
|
|
|
int ret = 0;
|
2016-08-15 01:44:41 +08:00
|
|
|
|
2019-01-28 18:23:54 +08:00
|
|
|
spin_lock(&obj->vma.lock);
|
|
|
|
while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
|
|
|
|
struct i915_vma,
|
|
|
|
obj_link))) {
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
struct i915_address_space *vm = vma->vm;
|
|
|
|
|
|
|
|
ret = -EBUSY;
|
|
|
|
if (!i915_vm_tryopen(vm))
|
|
|
|
break;
|
|
|
|
|
2016-08-04 14:52:27 +08:00
|
|
|
list_move_tail(&vma->obj_link, &still_in_list);
|
2019-01-28 18:23:54 +08:00
|
|
|
spin_unlock(&obj->vma.lock);
|
|
|
|
|
2019-07-03 17:17:17 +08:00
|
|
|
if (flags & I915_GEM_OBJECT_UNBIND_ACTIVE ||
|
|
|
|
!i915_vma_is_active(vma))
|
|
|
|
ret = i915_vma_unbind(vma);
|
2019-01-28 18:23:54 +08:00
|
|
|
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
i915_vm_close(vm);
|
2019-01-28 18:23:54 +08:00
|
|
|
spin_lock(&obj->vma.lock);
|
2016-08-04 14:52:27 +08:00
|
|
|
}
|
2019-01-28 18:23:54 +08:00
|
|
|
list_splice(&still_in_list, &obj->vma.list);
|
|
|
|
spin_unlock(&obj->vma.lock);
|
2016-08-04 14:52:27 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-05-21 19:42:56 +08:00
|
|
|
static int
|
|
|
|
i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pwrite *args,
|
2016-10-28 20:58:36 +08:00
|
|
|
struct drm_file *file)
|
2014-05-21 19:42:56 +08:00
|
|
|
{
|
|
|
|
void *vaddr = obj->phys_handle->vaddr + args->offset;
|
2016-04-26 23:32:27 +08:00
|
|
|
char __user *user_data = u64_to_user_ptr(args->data_ptr);
|
2014-11-04 20:51:40 +08:00
|
|
|
|
2019-08-16 15:46:35 +08:00
|
|
|
/*
|
|
|
|
* We manually control the domain here and pretend that it
|
2014-11-04 20:51:40 +08:00
|
|
|
* remains coherent i.e. in the GTT domain, like shmem_pwrite.
|
|
|
|
*/
|
2019-08-16 15:46:35 +08:00
|
|
|
intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
|
|
|
|
|
2017-01-06 23:22:38 +08:00
|
|
|
if (copy_from_user(vaddr, user_data, args->size))
|
|
|
|
return -EFAULT;
|
2014-05-21 19:42:56 +08:00
|
|
|
|
2014-11-04 20:51:40 +08:00
|
|
|
drm_clflush_virt_range(vaddr, args->size);
|
2019-06-21 15:08:02 +08:00
|
|
|
intel_gt_chipset_flush(&to_i915(obj->base.dev)->gt);
|
2015-02-14 03:23:45 +08:00
|
|
|
|
2019-08-16 15:46:35 +08:00
|
|
|
intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
|
2017-01-06 23:22:38 +08:00
|
|
|
return 0;
|
2014-05-21 19:42:56 +08:00
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
static int
|
|
|
|
i915_gem_create(struct drm_file *file,
|
2016-12-01 22:16:37 +08:00
|
|
|
struct drm_i915_private *dev_priv,
|
2019-03-27 01:02:18 +08:00
|
|
|
u64 *size_p,
|
2019-01-16 17:15:19 +08:00
|
|
|
u32 *handle_p)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2009-08-23 17:40:55 +08:00
|
|
|
u32 handle;
|
2019-03-27 01:02:18 +08:00
|
|
|
u64 size;
|
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-03-27 01:02:18 +08:00
|
|
|
size = round_up(*size_p, PAGE_SIZE);
|
2011-09-14 20:14:28 +08:00
|
|
|
if (size == 0)
|
|
|
|
return -EINVAL;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
|
|
|
/* Allocate the new object */
|
2019-05-28 17:29:45 +08:00
|
|
|
obj = i915_gem_object_create_shmem(dev_priv, size);
|
2016-04-25 20:32:13 +08:00
|
|
|
if (IS_ERR(obj))
|
|
|
|
return PTR_ERR(obj);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-09 03:18:58 +08:00
|
|
|
ret = drm_gem_handle_create(file, &obj->base, &handle);
|
2010-10-14 20:20:40 +08:00
|
|
|
/* drop reference from allocate - handle holds it now */
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2013-07-25 05:25:03 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2010-10-14 20:20:40 +08:00
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
*handle_p = handle;
|
2019-04-17 21:25:07 +08:00
|
|
|
*size_p = size;
|
2008-07-31 03:06:12 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
int
|
|
|
|
i915_gem_dumb_create(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
|
|
|
struct drm_mode_create_dumb *args)
|
|
|
|
{
|
2019-05-09 20:21:57 +08:00
|
|
|
int cpp = DIV_ROUND_UP(args->bpp, 8);
|
|
|
|
u32 format;
|
|
|
|
|
|
|
|
switch (cpp) {
|
|
|
|
case 1:
|
|
|
|
format = DRM_FORMAT_C8;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
format = DRM_FORMAT_RGB565;
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
format = DRM_FORMAT_XRGB8888;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
/* have to work out size/pitch and return them */
|
2019-05-09 20:21:57 +08:00
|
|
|
args->pitch = ALIGN(args->width * cpp, 64);
|
|
|
|
|
|
|
|
/* align stride to page size so that we can remap */
|
|
|
|
if (args->pitch > intel_plane_fb_max_stride(to_i915(dev), format,
|
|
|
|
DRM_FORMAT_MOD_LINEAR))
|
|
|
|
args->pitch = ALIGN(args->pitch, 4096);
|
|
|
|
|
2011-02-07 10:16:14 +08:00
|
|
|
args->size = args->pitch * args->height;
|
2016-12-01 22:16:37 +08:00
|
|
|
return i915_gem_create(file, to_i915(dev),
|
2019-03-27 01:02:18 +08:00
|
|
|
&args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Creates a new mm object and returns a handle to it.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2011-02-07 10:16:14 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
2016-12-01 22:16:37 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2011-02-07 10:16:14 +08:00
|
|
|
struct drm_i915_gem_create *args = data;
|
2012-04-23 22:50:50 +08:00
|
|
|
|
2016-12-01 22:16:37 +08:00
|
|
|
i915_gem_flush_free_objects(dev_priv);
|
2016-10-28 20:58:42 +08:00
|
|
|
|
2016-12-01 22:16:37 +08:00
|
|
|
return i915_gem_create(file, dev_priv,
|
2019-03-27 01:02:18 +08:00
|
|
|
&args->size, &args->handle);
|
2011-02-07 10:16:14 +08:00
|
|
|
}
|
|
|
|
|
2012-03-26 01:47:40 +08:00
|
|
|
static int
|
2019-01-05 20:07:58 +08:00
|
|
|
shmem_pread(struct page *page, int offset, int len, char __user *user_data,
|
|
|
|
bool needs_clflush)
|
2012-03-26 01:47:40 +08:00
|
|
|
{
|
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
vaddr = kmap(page);
|
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
if (needs_clflush)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
ret = __copy_to_user(user_data, vaddr + offset, len);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
kunmap(page);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
return ret ? -EFAULT : 0;
|
2016-10-28 20:58:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pread(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pread *args)
|
|
|
|
{
|
|
|
|
unsigned int needs_clflush;
|
|
|
|
unsigned int idx, offset;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
|
|
|
char __user *user_data;
|
|
|
|
u64 remain;
|
2016-10-28 20:58:39 +08:00
|
|
|
int ret;
|
|
|
|
|
2019-05-28 17:29:48 +08:00
|
|
|
ret = i915_gem_object_prepare_read(obj, &needs_clflush);
|
2016-10-28 20:58:39 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_finish_access(obj);
|
|
|
|
if (!fence)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
remain = args->size;
|
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
2018-10-12 22:02:28 +08:00
|
|
|
unsigned int length = min_t(u64, remain, PAGE_SIZE - offset);
|
2016-10-28 20:58:39 +08:00
|
|
|
|
|
|
|
ret = shmem_pread(page, offset, length, user_data,
|
|
|
|
needs_clflush);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
2016-10-28 20:58:39 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool
|
|
|
|
gtt_user_read(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
{
|
2017-09-02 01:12:52 +08:00
|
|
|
void __iomem *vaddr;
|
2016-10-28 20:58:39 +08:00
|
|
|
unsigned long unwritten;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_to_user_inatomic(user_data,
|
|
|
|
(void __force *)vaddr + offset,
|
|
|
|
length);
|
2016-10-28 20:58:39 +08:00
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_to_user(user_data,
|
|
|
|
(void __force *)vaddr + offset,
|
|
|
|
length);
|
2016-10-28 20:58:39 +08:00
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2016-10-28 20:58:39 +08:00
|
|
|
i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pread *args)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
{
|
2016-10-28 20:58:39 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
struct drm_mm_node node;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
2016-10-28 20:58:39 +08:00
|
|
|
void __user *user_data;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct i915_vma *vma;
|
2016-10-28 20:58:39 +08:00
|
|
|
u64 remain, offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
int ret;
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2019-08-22 14:15:57 +08:00
|
|
|
vma = ERR_PTR(-ENODEV);
|
|
|
|
if (!i915_gem_object_is_tiled(obj))
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
|
|
|
PIN_MAPPABLE |
|
|
|
|
PIN_NONBLOCK /* NOWARN */ |
|
|
|
|
PIN_NOEVICT);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
2019-10-04 05:00:59 +08:00
|
|
|
node.flags = 0;
|
2019-08-22 14:15:57 +08:00
|
|
|
} else {
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (ret)
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
goto out_rpm;
|
2019-10-04 05:00:58 +08:00
|
|
|
GEM_BUG_ON(!drm_mm_node_allocated(&node));
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_lock_interruptible(obj);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
if (ret) {
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
if (!fence) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = args->offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
|
|
|
while (remain > 0) {
|
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
|
|
|
u32 page_base = node.start;
|
|
|
|
unsigned page_offset = offset_in_page(offset);
|
|
|
|
unsigned page_length = PAGE_SIZE - page_offset;
|
|
|
|
page_length = remain < page_length ? remain : page_length;
|
2019-10-04 05:00:58 +08:00
|
|
|
if (drm_mm_node_allocated(&node)) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.insert_page(&ggtt->vm,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2016-10-28 20:58:39 +08:00
|
|
|
|
2017-12-11 23:18:20 +08:00
|
|
|
if (gtt_user_read(&ggtt->iomap, page_base, page_offset,
|
2016-10-28 20:58:39 +08:00
|
|
|
user_data, page_length)) {
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
|
|
|
}
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
out_unpin:
|
2019-10-04 05:00:58 +08:00
|
|
|
if (drm_mm_node_allocated(&node)) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
remove_mappable_node(ggtt, &node);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
}
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
out_rpm:
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/**
|
|
|
|
* Reads data from the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of *data are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pread_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pread *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 20:58:39 +08:00
|
|
|
int ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-11-17 17:10:42 +08:00
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(u64_to_user_ptr(args->data_ptr),
|
2010-11-17 17:10:42 +08:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check source. */
|
2016-12-14 04:32:22 +08:00
|
|
|
if (range_overflows_t(u64, args->offset, args->size, obj->base.size)) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pread(obj, args->offset, args->size);
|
|
|
|
|
2016-10-28 20:58:27 +08:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
2019-02-13 17:25:04 +08:00
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2016-08-05 17:14:16 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:39 +08:00
|
|
|
goto out;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_shmem_pread(obj, args);
|
2016-10-24 20:42:15 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENODEV)
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = i915_gem_gtt_pread(obj, args);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2016-10-28 20:58:39 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
out:
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2009-03-11 02:44:52 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
/* This is the fast write path which cannot handle
|
|
|
|
* page faults in the source data
|
2008-10-21 05:16:43 +08:00
|
|
|
*/
|
2008-10-31 10:38:48 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
static inline bool
|
|
|
|
ggtt_write(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
2008-10-21 05:16:43 +08:00
|
|
|
{
|
2017-09-02 01:12:52 +08:00
|
|
|
void __iomem *vaddr;
|
2008-10-31 10:38:48 +08:00
|
|
|
unsigned long unwritten;
|
2008-10-21 05:16:43 +08:00
|
|
|
|
2012-04-17 05:07:47 +08:00
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset,
|
2008-10-31 10:38:48 +08:00
|
|
|
user_data, length);
|
2016-10-28 20:58:40 +08:00
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
2017-09-02 01:12:52 +08:00
|
|
|
vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_from_user((void __force *)vaddr + offset,
|
|
|
|
user_data, length);
|
2016-10-28 20:58:40 +08:00
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
2016-10-28 20:58:39 +08:00
|
|
|
|
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
2009-03-10 00:42:23 +08:00
|
|
|
/**
|
|
|
|
* This is the fast pwrite path, where we copy the data directly from the
|
|
|
|
* user into the GTT, uncached.
|
2016-10-28 20:58:40 +08:00
|
|
|
* @obj: i915 GEM object
|
2016-06-03 21:02:17 +08:00
|
|
|
* @args: pwrite arguments structure
|
2009-03-10 00:42:23 +08:00
|
|
|
*/
|
2008-07-31 03:06:12 +08:00
|
|
|
static int
|
2016-10-28 20:58:40 +08:00
|
|
|
i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2016-10-28 20:58:40 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
2016-06-10 16:53:01 +08:00
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
2019-06-14 07:21:54 +08:00
|
|
|
struct intel_runtime_pm *rpm = &i915->runtime_pm;
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
2016-06-10 16:53:01 +08:00
|
|
|
struct drm_mm_node node;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
2016-10-28 20:58:40 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
u64 remain, offset;
|
|
|
|
void __user *user_data;
|
2016-06-10 16:53:01 +08:00
|
|
|
int ret;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
|
2017-10-19 14:37:33 +08:00
|
|
|
if (i915_gem_object_has_struct_page(obj)) {
|
|
|
|
/*
|
|
|
|
* Avoid waking the device up if we can fallback, as
|
|
|
|
* waking/resuming is very slow (worst-case 10-100 ms
|
|
|
|
* depending on PCI sleeps and our own resume time).
|
|
|
|
* This easily dwarfs any performance advantage from
|
|
|
|
* using the cache bypass of indirect GGTT access.
|
|
|
|
*/
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get_if_in_use(rpm);
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
if (!wakeref)
|
|
|
|
return -EFAULT;
|
2017-10-19 14:37:33 +08:00
|
|
|
} else {
|
|
|
|
/* No backing pages, no fallback, we must force GGTT access */
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(rpm);
|
2017-10-19 14:37:33 +08:00
|
|
|
}
|
|
|
|
|
2019-08-22 14:15:57 +08:00
|
|
|
vma = ERR_PTR(-ENODEV);
|
|
|
|
if (!i915_gem_object_is_tiled(obj))
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
|
|
|
PIN_MAPPABLE |
|
|
|
|
PIN_NONBLOCK /* NOWARN */ |
|
|
|
|
PIN_NOEVICT);
|
2016-08-19 00:16:45 +08:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
2019-10-04 05:00:59 +08:00
|
|
|
node.flags = 0;
|
2019-08-22 14:15:57 +08:00
|
|
|
} else {
|
2016-10-28 20:58:39 +08:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
2016-06-10 16:53:01 +08:00
|
|
|
if (ret)
|
2017-10-19 14:37:33 +08:00
|
|
|
goto out_rpm;
|
2019-10-04 05:00:58 +08:00
|
|
|
GEM_BUG_ON(!drm_mm_node_allocated(&node));
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
2012-03-26 01:47:35 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_lock_interruptible(obj);
|
2012-03-26 01:47:35 +08:00
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, true);
|
|
|
|
if (ret) {
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
|
|
|
|
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_unlock(obj);
|
|
|
|
if (!fence) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_unpin;
|
|
|
|
}
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-08-16 15:46:35 +08:00
|
|
|
intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU);
|
2015-02-14 03:23:45 +08:00
|
|
|
|
2016-06-10 16:53:01 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = args->offset;
|
|
|
|
remain = args->size;
|
|
|
|
while (remain) {
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
2008-10-31 10:38:48 +08:00
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
2016-06-10 16:53:01 +08:00
|
|
|
u32 page_base = node.start;
|
2016-10-28 20:58:39 +08:00
|
|
|
unsigned int page_offset = offset_in_page(offset);
|
|
|
|
unsigned int page_length = PAGE_SIZE - page_offset;
|
2016-06-10 16:53:01 +08:00
|
|
|
page_length = remain < page_length ? remain : page_length;
|
2019-10-04 05:00:58 +08:00
|
|
|
if (drm_mm_node_allocated(&node)) {
|
2019-07-18 22:54:05 +08:00
|
|
|
/* flush the write before we modify the GGTT */
|
|
|
|
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.insert_page(&ggtt->vm,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
2016-06-10 16:53:01 +08:00
|
|
|
wmb(); /* flush modifications to the GGTT (insert_page) */
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2008-10-31 10:38:48 +08:00
|
|
|
/* If we get a fault while copying data, then (presumably) our
|
2009-03-10 00:42:23 +08:00
|
|
|
* source page isn't available. Return the error and we'll
|
|
|
|
* retry in the slow path.
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
* If the object is non-shmem backed, we retry again with the
|
|
|
|
* path that handles page fault.
|
2008-10-31 10:38:48 +08:00
|
|
|
*/
|
2017-12-11 23:18:20 +08:00
|
|
|
if (ggtt_write(&ggtt->iomap, page_base, page_offset,
|
2016-10-28 20:58:40 +08:00
|
|
|
user_data, page_length)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
2012-03-26 01:47:35 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2008-10-31 10:38:48 +08:00
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2019-08-16 15:46:35 +08:00
|
|
|
intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
2012-03-26 01:47:35 +08:00
|
|
|
out_unpin:
|
2019-07-18 22:54:05 +08:00
|
|
|
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
|
2019-10-04 05:00:58 +08:00
|
|
|
if (drm_mm_node_allocated(&node)) {
|
2018-06-05 23:37:58 +08:00
|
|
|
ggtt->vm.clear_range(&ggtt->vm, node.start, node.size);
|
drm/i915: Pull i915_vma_pin under the vm->mutex
Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).
In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.
Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!
v2: Add some commentary, and some helpers to reduce patch churn.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
2019-10-04 21:39:58 +08:00
|
|
|
remove_mappable_node(ggtt, &node);
|
2016-06-10 16:53:01 +08:00
|
|
|
} else {
|
2016-08-15 17:49:06 +08:00
|
|
|
i915_vma_unpin(vma);
|
2016-06-10 16:53:01 +08:00
|
|
|
}
|
2017-10-19 14:37:33 +08:00
|
|
|
out_rpm:
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(rpm, wakeref);
|
2009-03-10 00:42:23 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
/* Per-page copy function for the shmem pwrite fastpath.
|
|
|
|
* Flushes invalid cachelines before writing to the target if
|
|
|
|
* needs_clflush_before is set and flushes out any written cachelines after
|
|
|
|
* writing if needs_clflush is set.
|
|
|
|
*/
|
2009-03-10 04:42:30 +08:00
|
|
|
static int
|
2016-10-28 20:58:40 +08:00
|
|
|
shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2009-03-10 04:42:30 +08:00
|
|
|
{
|
2019-01-05 20:07:58 +08:00
|
|
|
char *vaddr;
|
2016-10-28 20:58:40 +08:00
|
|
|
int ret;
|
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
vaddr = kmap(page);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
if (needs_clflush_before)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
ret = __copy_from_user(vaddr + offset, user_data, len);
|
|
|
|
if (!ret && needs_clflush_after)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
2016-10-28 20:58:40 +08:00
|
|
|
|
2019-01-05 20:07:58 +08:00
|
|
|
kunmap(page);
|
|
|
|
|
|
|
|
return ret ? -EFAULT : 0;
|
2016-10-28 20:58:40 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
|
|
|
{
|
|
|
|
unsigned int partial_cacheline_write;
|
2016-08-19 00:16:47 +08:00
|
|
|
unsigned int needs_clflush;
|
2016-10-28 20:58:40 +08:00
|
|
|
unsigned int offset, idx;
|
2019-05-28 17:29:51 +08:00
|
|
|
struct dma_fence *fence;
|
|
|
|
void __user *user_data;
|
|
|
|
u64 remain;
|
2016-10-28 20:58:40 +08:00
|
|
|
int ret;
|
2009-03-10 04:42:30 +08:00
|
|
|
|
2019-05-28 17:29:48 +08:00
|
|
|
ret = i915_gem_object_prepare_write(obj, &needs_clflush);
|
2016-10-28 20:58:40 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
fence = i915_gem_object_lock_fence(obj);
|
|
|
|
i915_gem_object_finish_access(obj);
|
|
|
|
if (!fence)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
/* If we don't overwrite a cacheline completely we need to be
|
|
|
|
* careful to have up-to-date data by first clflushing. Don't
|
|
|
|
* overcomplicate things and flush the entire patch.
|
|
|
|
*/
|
|
|
|
partial_cacheline_write = 0;
|
|
|
|
if (needs_clflush & CLFLUSH_BEFORE)
|
|
|
|
partial_cacheline_write = boot_cpu_data.x86_clflush_size - 1;
|
2012-06-01 22:20:22 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
2018-10-12 22:02:28 +08:00
|
|
|
unsigned int length = min_t(u64, remain, PAGE_SIZE - offset);
|
2012-09-05 04:02:55 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = shmem_pwrite(page, offset, length, user_data,
|
|
|
|
(offset | length) & partial_cacheline_write,
|
|
|
|
needs_clflush & CLFLUSH_AFTER);
|
2012-09-05 04:02:55 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:40 +08:00
|
|
|
break;
|
2012-09-05 04:02:55 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 20:57:31 +08:00
|
|
|
}
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2019-08-16 15:46:35 +08:00
|
|
|
intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU);
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock_fence(obj, fence);
|
|
|
|
|
2009-03-10 04:42:30 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Writes data to the object referenced by handle.
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*
|
|
|
|
* On error, the contents of the buffer that were to be modified are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
|
2010-10-14 22:03:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pwrite *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-11-17 17:10:42 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 10:57:57 +08:00
|
|
|
if (!access_ok(u64_to_user_ptr(args->data_ptr), args->size))
|
2010-11-17 17:10:42 +08:00
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2010-09-27 03:21:44 +08:00
|
|
|
/* Bounds check destination. */
|
2016-12-14 04:32:22 +08:00
|
|
|
if (range_overflows_t(u64, args->offset, args->size, obj->base.size)) {
|
2010-09-27 03:50:05 +08:00
|
|
|
ret = -EINVAL;
|
2016-08-05 17:14:16 +08:00
|
|
|
goto err;
|
2010-09-27 03:50:05 +08:00
|
|
|
}
|
|
|
|
|
2018-07-13 02:53:14 +08:00
|
|
|
/* Writes not allowed into this read-only object */
|
|
|
|
if (i915_gem_object_is_readonly(obj)) {
|
|
|
|
ret = -EINVAL;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2011-02-03 19:57:46 +08:00
|
|
|
trace_i915_gem_object_pwrite(obj, args->offset, args->size);
|
|
|
|
|
2017-03-07 20:03:38 +08:00
|
|
|
ret = -ENODEV;
|
|
|
|
if (obj->ops->pwrite)
|
|
|
|
ret = obj->ops->pwrite(obj, args);
|
|
|
|
if (ret != -ENODEV)
|
|
|
|
goto err;
|
|
|
|
|
2016-10-28 20:58:27 +08:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_ALL,
|
2019-02-13 17:25:04 +08:00
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
if (ret)
|
2016-10-28 20:58:40 +08:00
|
|
|
goto err;
|
2016-08-05 17:14:16 +08:00
|
|
|
|
2012-03-26 01:47:35 +08:00
|
|
|
ret = -EFAULT;
|
2008-07-31 03:06:12 +08:00
|
|
|
/* We can only do the GTT pwrite on untiled buffers, as otherwise
|
|
|
|
* it would end up going through the fenced access, and we'll get
|
|
|
|
* different detiling behavior between reading and writing.
|
|
|
|
* pread/pwrite currently are reading and writing from the CPU
|
|
|
|
* perspective, requiring manual detiling by the client.
|
|
|
|
*/
|
2016-06-20 22:05:52 +08:00
|
|
|
if (!i915_gem_object_has_struct_page(obj) ||
|
2016-10-24 20:42:15 +08:00
|
|
|
cpu_write_needs_clflush(obj))
|
2012-03-26 01:47:35 +08:00
|
|
|
/* Note that the gtt paths might fail with non-page-backed user
|
|
|
|
* pointers (e.g. gtt mappings when moving data between
|
2016-10-24 20:42:15 +08:00
|
|
|
* textures). Fallback to the shmem path in that case.
|
|
|
|
*/
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_gtt_pwrite_fast(obj, args);
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2016-07-17 01:42:36 +08:00
|
|
|
if (ret == -EFAULT || ret == -ENOSPC) {
|
2014-11-04 20:51:40 +08:00
|
|
|
if (obj->phys_handle)
|
|
|
|
ret = i915_gem_phys_pwrite(obj, args, file);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 16:53:03 +08:00
|
|
|
else
|
2016-10-28 20:58:40 +08:00
|
|
|
ret = i915_gem_shmem_pwrite(obj, args);
|
2014-11-04 20:51:40 +08:00
|
|
|
}
|
2011-12-14 20:57:30 +08:00
|
|
|
|
2016-10-28 20:58:40 +08:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
err:
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2016-08-05 17:14:16 +08:00
|
|
|
return ret;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Called when user space has done writes to this buffer
|
2016-06-03 21:02:17 +08:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 03:06:12 +08:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_file *file)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_sw_finish *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-10-17 16:45:41 +08:00
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 17:14:19 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 03:06:12 +08:00
|
|
|
|
2017-11-14 18:25:13 +08:00
|
|
|
/*
|
|
|
|
* Proxy objects are barred from CPU access, so there is no
|
|
|
|
* need to ban sw_finish as it is a nop.
|
|
|
|
*/
|
|
|
|
|
2008-07-31 03:06:12 +08:00
|
|
|
/* Pinned buffers may be scanout, so flush the cache */
|
2017-02-22 19:40:46 +08:00
|
|
|
i915_gem_object_flush_if_display(obj);
|
2016-10-28 20:58:43 +08:00
|
|
|
i915_gem_object_put(obj);
|
2017-02-22 19:40:46 +08:00
|
|
|
|
|
|
|
return 0;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
void i915_gem_runtime_suspend(struct drm_i915_private *i915)
|
2014-06-16 15:57:44 +08:00
|
|
|
{
|
2016-10-24 20:42:16 +08:00
|
|
|
struct drm_i915_gem_object *obj, *on;
|
2016-10-24 20:42:18 +08:00
|
|
|
int i;
|
2014-06-16 15:57:44 +08:00
|
|
|
|
2016-10-24 20:42:16 +08:00
|
|
|
/*
|
|
|
|
* Only called during RPM suspend. All users of the userfault_list
|
|
|
|
* must be holding an RPM wakeref to ensure that this can not
|
|
|
|
* run concurrently with themselves (and use the struct_mutex for
|
|
|
|
* protection between themselves).
|
|
|
|
*/
|
2016-10-24 20:42:14 +08:00
|
|
|
|
2016-10-24 20:42:16 +08:00
|
|
|
list_for_each_entry_safe(obj, on,
|
2019-06-13 15:32:54 +08:00
|
|
|
&i915->ggtt.userfault_list, userfault_link)
|
2017-10-09 16:43:57 +08:00
|
|
|
__i915_gem_object_release_mmap(obj);
|
2016-10-24 20:42:18 +08:00
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
/*
|
|
|
|
* The fence will be lost when the device powers down. If any were
|
2016-10-24 20:42:18 +08:00
|
|
|
* in use by hardware (i.e. they are pinned), we should not be powering
|
|
|
|
* down! All other fences will be reacquired by the user upon waking.
|
|
|
|
*/
|
2019-06-13 15:32:54 +08:00
|
|
|
for (i = 0; i < i915->ggtt.num_fences; i++) {
|
|
|
|
struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i];
|
2016-10-24 20:42:18 +08:00
|
|
|
|
2019-06-13 15:32:54 +08:00
|
|
|
/*
|
|
|
|
* Ideally we want to assert that the fence register is not
|
2017-02-03 20:57:17 +08:00
|
|
|
* live at this point (i.e. that no piece of code will be
|
|
|
|
* trying to write through fence + GTT, as that both violates
|
|
|
|
* our tracking of activity and associated locking/barriers,
|
|
|
|
* but also is illegal given that the hw is powered down).
|
|
|
|
*
|
|
|
|
* Previously we used reg->pin_count as a "liveness" indicator.
|
|
|
|
* That is not sufficient, and we need a more fine-grained
|
|
|
|
* tool if we want to have a sanity check here.
|
|
|
|
*/
|
2016-10-24 20:42:18 +08:00
|
|
|
|
|
|
|
if (!reg->vma)
|
|
|
|
continue;
|
|
|
|
|
2017-10-09 16:43:57 +08:00
|
|
|
GEM_BUG_ON(i915_vma_has_userfault(reg->vma));
|
2016-10-24 20:42:18 +08:00
|
|
|
reg->dirty = true;
|
|
|
|
}
|
2014-06-16 15:57:44 +08:00
|
|
|
}
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
struct i915_vma *
|
2015-03-16 20:11:13 +08:00
|
|
|
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view,
|
2016-08-04 23:32:23 +08:00
|
|
|
u64 size,
|
2016-08-04 23:32:22 +08:00
|
|
|
u64 alignment,
|
|
|
|
u64 flags)
|
2015-03-16 20:11:13 +08:00
|
|
|
{
|
2016-10-13 16:55:04 +08:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2018-06-05 23:37:58 +08:00
|
|
|
struct i915_address_space *vm = &dev_priv->ggtt.vm;
|
2016-08-04 23:32:31 +08:00
|
|
|
struct i915_vma *vma;
|
|
|
|
int ret;
|
2016-03-30 21:57:10 +08:00
|
|
|
|
2019-09-28 16:25:46 +08:00
|
|
|
if (i915_gem_object_never_bind_ggtt(obj))
|
|
|
|
return ERR_PTR(-ENODEV);
|
|
|
|
|
2018-02-20 21:42:05 +08:00
|
|
|
if (flags & PIN_MAPPABLE &&
|
|
|
|
(!view || view->type == I915_GGTT_VIEW_NORMAL)) {
|
2017-10-09 16:44:01 +08:00
|
|
|
/* If the required space is larger than the available
|
|
|
|
* aperture, we will not able to find a slot for the
|
|
|
|
* object and unbinding the object now will be in
|
|
|
|
* vain. Worse, doing so may cause us to ping-pong
|
|
|
|
* the object in and out of the Global GTT and
|
|
|
|
* waste a lot of cycles under the mutex.
|
|
|
|
*/
|
|
|
|
if (obj->base.size > dev_priv->ggtt.mappable_end)
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
/* If NONBLOCK is set the caller is optimistically
|
|
|
|
* trying to cache the full object within the mappable
|
|
|
|
* aperture, and *must* have a fallback in place for
|
|
|
|
* situations where we cannot bind the object. We
|
|
|
|
* can be a little more lax here and use the fallback
|
|
|
|
* more often to avoid costly migrations of ourselves
|
|
|
|
* and other objects within the aperture.
|
|
|
|
*
|
|
|
|
* Half-the-aperture is used as a simple heuristic.
|
|
|
|
* More interesting would to do search for a free
|
|
|
|
* block prior to making the commitment to unbind.
|
|
|
|
* That caters for the self-harm case, and with a
|
|
|
|
* little more heuristics (e.g. NOFAULT, NOEVICT)
|
|
|
|
* we could try to minimise harm to others.
|
|
|
|
*/
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
obj->base.size > dev_priv->ggtt.mappable_end / 2)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2017-01-16 23:21:28 +08:00
|
|
|
vma = i915_vma_instance(obj, vm, view);
|
2019-02-21 10:08:19 +08:00
|
|
|
if (IS_ERR(vma))
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2016-08-04 23:32:31 +08:00
|
|
|
|
|
|
|
if (i915_vma_misplaced(vma, size, alignment, flags)) {
|
2017-10-09 16:44:01 +08:00
|
|
|
if (flags & PIN_NONBLOCK) {
|
|
|
|
if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
|
|
|
|
return ERR_PTR(-ENOSPC);
|
2016-08-04 23:32:31 +08:00
|
|
|
|
2017-10-09 16:44:01 +08:00
|
|
|
if (flags & PIN_MAPPABLE &&
|
2017-01-10 00:16:11 +08:00
|
|
|
vma->fence_size > dev_priv->ggtt.mappable_end / 2)
|
2016-10-13 16:55:04 +08:00
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2016-08-04 23:32:31 +08:00
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
2016-08-15 17:49:06 +08:00
|
|
|
return ERR_PTR(ret);
|
2016-08-04 23:32:31 +08:00
|
|
|
}
|
|
|
|
|
2019-08-23 23:39:44 +08:00
|
|
|
if (vma->fence && !i915_gem_object_is_tiled(obj)) {
|
|
|
|
mutex_lock(&vma->vm->mutex);
|
|
|
|
ret = i915_vma_revoke_fence(vma);
|
|
|
|
mutex_unlock(&vma->vm->mutex);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
2015-03-16 20:11:13 +08:00
|
|
|
|
2016-08-15 17:49:06 +08:00
|
|
|
return vma;
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
|
|
|
|
2009-09-14 23:50:29 +08:00
|
|
|
int
|
|
|
|
i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2019-05-31 04:34:59 +08:00
|
|
|
struct drm_i915_private *i915 = to_i915(dev);
|
2009-09-14 23:50:29 +08:00
|
|
|
struct drm_i915_gem_madvise *args = data;
|
2010-11-09 03:18:58 +08:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 20:58:37 +08:00
|
|
|
int err;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
|
|
|
switch (args->madv) {
|
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
case I915_MADV_WILLNEED:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-07-20 20:31:51 +08:00
|
|
|
obj = i915_gem_object_lookup(file_priv, args->handle);
|
2016-10-28 20:58:37 +08:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
err = mutex_lock_interruptible(&obj->mm.lock);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2017-10-14 04:26:13 +08:00
|
|
|
if (i915_gem_object_has_pages(obj) &&
|
2016-08-05 17:14:23 +08:00
|
|
|
i915_gem_object_is_tiled(obj) &&
|
2019-05-31 04:34:59 +08:00
|
|
|
i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
|
2016-11-01 18:03:17 +08:00
|
|
|
if (obj->mm.madv == I915_MADV_WILLNEED) {
|
|
|
|
GEM_BUG_ON(!obj->mm.quirked);
|
2016-10-28 20:58:35 +08:00
|
|
|
__i915_gem_object_unpin_pages(obj);
|
2016-11-01 18:03:17 +08:00
|
|
|
obj->mm.quirked = false;
|
|
|
|
}
|
|
|
|
if (args->madv == I915_MADV_WILLNEED) {
|
2016-11-04 18:30:01 +08:00
|
|
|
GEM_BUG_ON(obj->mm.quirked);
|
2016-10-28 20:58:35 +08:00
|
|
|
__i915_gem_object_pin_pages(obj);
|
2016-11-01 18:03:17 +08:00
|
|
|
obj->mm.quirked = true;
|
|
|
|
}
|
2014-11-20 16:26:30 +08:00
|
|
|
}
|
|
|
|
|
2016-10-28 20:58:35 +08:00
|
|
|
if (obj->mm.madv != __I915_MADV_PURGED)
|
|
|
|
obj->mm.madv = args->madv;
|
2009-09-14 23:50:29 +08:00
|
|
|
|
2019-05-31 04:34:59 +08:00
|
|
|
if (i915_gem_object_has_pages(obj)) {
|
|
|
|
struct list_head *list;
|
|
|
|
|
2019-05-31 04:35:00 +08:00
|
|
|
if (i915_gem_object_is_shrinkable(obj)) {
|
2019-06-10 22:54:30 +08:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&i915->mm.obj_lock, flags);
|
|
|
|
|
2019-05-31 04:35:00 +08:00
|
|
|
if (obj->mm.madv != I915_MADV_WILLNEED)
|
|
|
|
list = &i915->mm.purge_list;
|
|
|
|
else
|
2019-06-12 18:57:20 +08:00
|
|
|
list = &i915->mm.shrink_list;
|
2019-05-31 04:35:00 +08:00
|
|
|
list_move_tail(&obj->mm.link, list);
|
2019-06-10 22:54:30 +08:00
|
|
|
|
|
|
|
spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
|
2019-05-31 04:35:00 +08:00
|
|
|
}
|
2019-05-31 04:34:59 +08:00
|
|
|
}
|
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 17:40:46 +08:00
|
|
|
/* if the object is no longer attached, discard its backing storage */
|
2017-10-14 04:26:13 +08:00
|
|
|
if (obj->mm.madv == I915_MADV_DONTNEED &&
|
|
|
|
!i915_gem_object_has_pages(obj))
|
2019-05-28 17:29:46 +08:00
|
|
|
i915_gem_object_truncate(obj);
|
2009-09-21 06:13:10 +08:00
|
|
|
|
2016-10-28 20:58:35 +08:00
|
|
|
args->retained = obj->mm.madv != __I915_MADV_PURGED;
|
2016-10-28 20:58:37 +08:00
|
|
|
mutex_unlock(&obj->mm.lock);
|
2009-09-22 21:24:13 +08:00
|
|
|
|
2016-10-28 20:58:37 +08:00
|
|
|
out:
|
2016-07-20 20:31:53 +08:00
|
|
|
i915_gem_object_put(obj);
|
2016-10-28 20:58:37 +08:00
|
|
|
return err;
|
2009-09-14 23:50:29 +08:00
|
|
|
}
|
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
void i915_gem_sanitize(struct drm_i915_private *i915)
|
|
|
|
{
|
2019-01-14 22:21:18 +08:00
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
|
2018-05-31 16:22:45 +08:00
|
|
|
GEM_TRACE("\n");
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_get(&i915->uncore, FORCEWAKE_ALL);
|
2018-05-31 16:22:45 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* As we have just resumed the machine and woken the device up from
|
|
|
|
* deep PCI sleep (presumably D3_cold), assume the HW has been reset
|
|
|
|
* back to defaults, recovering from whatever wedged state we left it
|
|
|
|
* in and so worth trying to use the device once more.
|
|
|
|
*/
|
2019-07-13 03:29:53 +08:00
|
|
|
if (intel_gt_is_wedged(&i915->gt))
|
|
|
|
intel_gt_unset_wedged(&i915->gt);
|
2017-08-26 19:09:34 +08:00
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
/*
|
|
|
|
* If we inherit context state from the BIOS or earlier occupants
|
|
|
|
* of the GPU, the GPU may be in an inconsistent state when we
|
|
|
|
* try to take over. The only way to remove the earlier state
|
|
|
|
* is by resetting. However, resetting on earlier gen is tricky as
|
|
|
|
* it may impact the display and we are uncertain about the stability
|
2017-04-28 15:53:38 +08:00
|
|
|
* of the reset, so this could be applied to even earlier gen.
|
2017-01-24 19:01:35 +08:00
|
|
|
*/
|
2019-06-25 21:01:10 +08:00
|
|
|
intel_gt_sanitize(&i915->gt, false);
|
2018-05-31 16:22:45 +08:00
|
|
|
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL);
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2017-01-24 19:01:35 +08:00
|
|
|
}
|
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
static int __intel_engines_record_defaults(struct drm_i915_private *i915)
|
|
|
|
{
|
2019-08-08 19:06:11 +08:00
|
|
|
struct i915_request *requests[I915_NUM_ENGINES] = {};
|
2017-11-10 22:26:33 +08:00
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
enum intel_engine_id id;
|
2019-03-08 17:36:55 +08:00
|
|
|
int err = 0;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* As we reset the gpu during very early sanitisation, the current
|
|
|
|
* register state on the GPU should reflect its defaults values.
|
|
|
|
* We load a context onto the hw (with restore-inhibit), then switch
|
|
|
|
* over to a second context to save that default register state. We
|
|
|
|
* can then prime every new context with that state so they all start
|
|
|
|
* from the same default HW values.
|
|
|
|
*/
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
2019-08-08 19:06:11 +08:00
|
|
|
struct intel_context *ce;
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *rq;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
/* We must be able to switch to something! */
|
|
|
|
GEM_BUG_ON(!engine->kernel_context);
|
|
|
|
engine->serial++; /* force the kernel context switch */
|
|
|
|
|
|
|
|
ce = intel_context_create(i915->kernel_context, engine);
|
|
|
|
if (IS_ERR(ce)) {
|
|
|
|
err = PTR_ERR(ce);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2019-04-27 00:33:34 +08:00
|
|
|
rq = intel_context_create_request(ce);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (IS_ERR(rq)) {
|
|
|
|
err = PTR_ERR(rq);
|
2019-08-08 19:06:11 +08:00
|
|
|
intel_context_put(ce);
|
|
|
|
goto out;
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
2019-07-29 19:37:20 +08:00
|
|
|
err = intel_engine_emit_ctx_wa(rq);
|
|
|
|
if (err)
|
|
|
|
goto err_rq;
|
|
|
|
|
|
|
|
err = intel_renderstate_emit(rq);
|
|
|
|
if (err)
|
|
|
|
goto err_rq;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-07-29 19:37:20 +08:00
|
|
|
err_rq:
|
2019-08-08 19:06:11 +08:00
|
|
|
requests[id] = i915_request_get(rq);
|
2018-06-12 18:51:35 +08:00
|
|
|
i915_request_add(rq);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (err)
|
2019-08-08 19:06:11 +08:00
|
|
|
goto out;
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
2019-03-08 17:36:55 +08:00
|
|
|
/* Flush the default context image to memory, and enable powersaving. */
|
2019-04-25 04:07:14 +08:00
|
|
|
if (!i915_gem_load_power_context(i915)) {
|
2019-03-08 17:36:55 +08:00
|
|
|
err = -EIO;
|
2019-08-08 19:06:11 +08:00
|
|
|
goto out;
|
2018-07-09 20:20:43 +08:00
|
|
|
}
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
for (id = 0; id < ARRAY_SIZE(requests); id++) {
|
|
|
|
struct i915_request *rq;
|
|
|
|
struct i915_vma *state;
|
2018-09-14 20:35:03 +08:00
|
|
|
void *vaddr;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
rq = requests[id];
|
|
|
|
if (!rq)
|
2017-11-10 22:26:33 +08:00
|
|
|
continue;
|
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
/* We want to be able to unbind the state from the GGTT */
|
|
|
|
GEM_BUG_ON(intel_context_is_pinned(rq->hw_context));
|
|
|
|
|
|
|
|
state = rq->hw_context->state;
|
|
|
|
if (!state)
|
|
|
|
continue;
|
2019-03-08 21:25:19 +08:00
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
/*
|
|
|
|
* As we will hold a reference to the logical state, it will
|
|
|
|
* not be torn down with the context, and importantly the
|
|
|
|
* object will hold onto its vma (making it possible for a
|
|
|
|
* stray GTT write to corrupt our defaults). Unmap the vma
|
|
|
|
* from the GTT to prevent such accidents and reclaim the
|
|
|
|
* space.
|
|
|
|
*/
|
|
|
|
err = i915_vma_unbind(state);
|
|
|
|
if (err)
|
2019-08-08 19:06:11 +08:00
|
|
|
goto out;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_lock(state->obj);
|
2017-11-10 22:26:33 +08:00
|
|
|
err = i915_gem_object_set_to_cpu_domain(state->obj, false);
|
2019-05-28 17:29:51 +08:00
|
|
|
i915_gem_object_unlock(state->obj);
|
2017-11-10 22:26:33 +08:00
|
|
|
if (err)
|
2019-08-08 19:06:11 +08:00
|
|
|
goto out;
|
2017-11-10 22:26:33 +08:00
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
i915_gem_object_set_cache_coherency(state->obj, I915_CACHE_LLC);
|
2018-09-14 20:35:03 +08:00
|
|
|
|
|
|
|
/* Check we can acquire the image of the context state */
|
2019-08-08 19:06:11 +08:00
|
|
|
vaddr = i915_gem_object_pin_map(state->obj, I915_MAP_FORCE_WB);
|
2018-09-14 20:35:03 +08:00
|
|
|
if (IS_ERR(vaddr)) {
|
|
|
|
err = PTR_ERR(vaddr);
|
2019-08-08 19:06:11 +08:00
|
|
|
goto out;
|
2018-09-14 20:35:03 +08:00
|
|
|
}
|
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
rq->engine->default_state = i915_gem_object_get(state->obj);
|
|
|
|
i915_gem_object_unpin_map(state->obj);
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
2019-08-08 19:06:11 +08:00
|
|
|
out:
|
2017-11-10 22:26:33 +08:00
|
|
|
/*
|
|
|
|
* If we have to abandon now, we expect the engines to be idle
|
2019-03-08 17:36:55 +08:00
|
|
|
* and ready to be torn-down. The quickest way we can accomplish
|
|
|
|
* this is by declaring ourselves wedged.
|
2017-11-10 22:26:33 +08:00
|
|
|
*/
|
2019-08-08 19:06:11 +08:00
|
|
|
if (err)
|
|
|
|
intel_gt_set_wedged(&i915->gt);
|
|
|
|
|
|
|
|
for (id = 0; id < ARRAY_SIZE(requests); id++) {
|
|
|
|
struct intel_context *ce;
|
|
|
|
struct i915_request *rq;
|
|
|
|
|
|
|
|
rq = requests[id];
|
|
|
|
if (!rq)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ce = rq->hw_context;
|
|
|
|
i915_request_put(rq);
|
|
|
|
intel_context_put(ce);
|
|
|
|
}
|
|
|
|
return err;
|
2017-11-10 22:26:33 +08:00
|
|
|
}
|
|
|
|
|
2019-04-17 15:56:28 +08:00
|
|
|
static int intel_engines_verify_workarounds(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
enum intel_engine_id id;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
if (intel_engine_verify_workarounds(engine, "load"))
|
|
|
|
err = -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2016-12-01 22:16:38 +08:00
|
|
|
int i915_gem_init(struct drm_i915_private *dev_priv)
|
2012-04-24 22:47:41 +08:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2018-05-08 17:07:05 +08:00
|
|
|
/* We need to fallback to 4K pages if host doesn't support huge gtt. */
|
|
|
|
if (intel_vgpu_active(dev_priv) && !intel_vgpu_has_huge_gtt(dev_priv))
|
2017-10-07 06:18:31 +08:00
|
|
|
mkwrite_device_info(dev_priv)->page_sizes =
|
|
|
|
I915_GTT_PAGE_SIZE_4K;
|
|
|
|
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_init(dev_priv);
|
2019-01-28 18:23:56 +08:00
|
|
|
|
2017-11-23 01:26:21 +08:00
|
|
|
ret = i915_gem_init_userptr(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fetch_firmwares(&dev_priv->gt.uc);
|
2019-08-03 02:40:55 +08:00
|
|
|
intel_wopcm_init(&dev_priv->wopcm);
|
2017-12-14 06:13:47 +08:00
|
|
|
|
2015-02-13 22:35:59 +08:00
|
|
|
/* This is just a security blanket to placate dragons.
|
|
|
|
* On some systems, we very sporadically observe that the first TLBs
|
|
|
|
* used by the CS may be stale, despite us poking the TLB reset. If
|
|
|
|
* we hold the forcewake during initialisation these problems
|
|
|
|
* just magically go away.
|
|
|
|
*/
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_get(&dev_priv->uncore, FORCEWAKE_ALL);
|
2015-02-13 22:35:59 +08:00
|
|
|
|
2019-06-21 15:08:05 +08:00
|
|
|
ret = i915_init_ggtt(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_unlock;
|
|
|
|
}
|
2013-03-09 02:45:53 +08:00
|
|
|
|
2019-09-05 19:14:03 +08:00
|
|
|
intel_gt_init(&dev_priv->gt);
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2019-04-27 00:33:33 +08:00
|
|
|
ret = intel_engines_setup(dev_priv);
|
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_unlock;
|
|
|
|
}
|
|
|
|
|
2019-10-04 21:40:09 +08:00
|
|
|
ret = i915_gem_init_contexts(dev_priv);
|
2018-12-04 22:15:16 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_scratch;
|
|
|
|
}
|
|
|
|
|
2016-12-01 22:16:38 +08:00
|
|
|
ret = intel_engines_init(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret) {
|
|
|
|
GEM_BUG_ON(ret == -EIO);
|
|
|
|
goto err_context;
|
|
|
|
}
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 06:11:04 +08:00
|
|
|
|
2017-11-10 22:26:29 +08:00
|
|
|
intel_init_gt_powersave(dev_priv);
|
|
|
|
|
2019-08-17 21:11:44 +08:00
|
|
|
intel_uc_init(&dev_priv->gt.uc);
|
2017-11-10 22:26:30 +08:00
|
|
|
|
2019-09-10 22:38:20 +08:00
|
|
|
ret = intel_gt_init_hw(&dev_priv->gt);
|
2017-12-14 06:13:48 +08:00
|
|
|
if (ret)
|
|
|
|
goto err_uc_init;
|
|
|
|
|
2019-06-26 23:45:49 +08:00
|
|
|
/* Only when the HW is re-initialised, can we replay the requests */
|
|
|
|
ret = intel_gt_resume(&dev_priv->gt);
|
|
|
|
if (ret)
|
|
|
|
goto err_init_hw;
|
|
|
|
|
2017-11-10 22:26:30 +08:00
|
|
|
/*
|
|
|
|
* Despite its name intel_init_clock_gating applies both display
|
|
|
|
* clock gating workarounds; GT mmio workarounds and the occasional
|
|
|
|
* GT power context workaround. Worse, sometimes it includes a context
|
|
|
|
* register workaround which we need to apply before we record the
|
|
|
|
* default HW state for all contexts.
|
|
|
|
*
|
|
|
|
* FIXME: break up the workarounds and apply them at the right time!
|
|
|
|
*/
|
|
|
|
intel_init_clock_gating(dev_priv);
|
|
|
|
|
2019-04-17 15:56:28 +08:00
|
|
|
ret = intel_engines_verify_workarounds(dev_priv);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2019-04-17 15:56:28 +08:00
|
|
|
|
2017-11-10 22:26:33 +08:00
|
|
|
ret = __intel_engines_record_defaults(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-08-03 02:40:50 +08:00
|
|
|
ret = i915_inject_load_error(dev_priv, -ENODEV);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-08-03 02:40:50 +08:00
|
|
|
ret = i915_inject_load_error(dev_priv, -EIO);
|
|
|
|
if (ret)
|
2019-06-26 23:45:49 +08:00
|
|
|
goto err_gt;
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
|
2017-12-13 21:43:47 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unwinding is complicated by that we want to handle -EIO to mean
|
|
|
|
* disable GPU submission but keep KMS alive. We want to mark the
|
|
|
|
* HW as irrevisibly wedged, but keep enough state around that the
|
|
|
|
* driver doesn't explode during runtime.
|
|
|
|
*/
|
2019-06-26 23:45:49 +08:00
|
|
|
err_gt:
|
2019-09-26 21:31:40 +08:00
|
|
|
intel_gt_set_wedged_on_init(&dev_priv->gt);
|
2019-03-08 17:36:54 +08:00
|
|
|
i915_gem_suspend(dev_priv);
|
2018-06-06 22:54:41 +08:00
|
|
|
i915_gem_suspend_late(dev_priv);
|
|
|
|
|
2018-07-10 17:44:20 +08:00
|
|
|
i915_gem_drain_workqueue(dev_priv);
|
2019-06-26 23:45:49 +08:00
|
|
|
err_init_hw:
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fini_hw(&dev_priv->gt.uc);
|
2017-12-14 06:13:48 +08:00
|
|
|
err_uc_init:
|
2017-12-13 21:43:47 +08:00
|
|
|
if (ret != -EIO) {
|
2019-08-17 21:11:44 +08:00
|
|
|
intel_uc_fini(&dev_priv->gt.uc);
|
2019-05-01 18:32:04 +08:00
|
|
|
intel_engines_cleanup(dev_priv);
|
2017-12-13 21:43:47 +08:00
|
|
|
}
|
|
|
|
err_context:
|
|
|
|
if (ret != -EIO)
|
2019-10-04 21:40:09 +08:00
|
|
|
i915_gem_driver_release__contexts(dev_priv);
|
2018-12-04 22:15:16 +08:00
|
|
|
err_scratch:
|
2019-09-05 19:14:03 +08:00
|
|
|
intel_gt_driver_release(&dev_priv->gt);
|
2017-12-13 21:43:47 +08:00
|
|
|
err_unlock:
|
2019-03-20 02:35:36 +08:00
|
|
|
intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2019-01-28 18:23:56 +08:00
|
|
|
if (ret != -EIO) {
|
2019-08-12 03:51:32 +08:00
|
|
|
intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
|
2017-12-13 21:43:47 +08:00
|
|
|
i915_gem_cleanup_userptr(dev_priv);
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_fini(dev_priv);
|
2019-01-28 18:23:56 +08:00
|
|
|
}
|
2017-12-13 21:43:47 +08:00
|
|
|
|
2014-04-09 16:19:42 +08:00
|
|
|
if (ret == -EIO) {
|
2017-12-13 21:43:47 +08:00
|
|
|
/*
|
2019-08-12 03:51:32 +08:00
|
|
|
* Allow engines or uC initialisation to fail by marking the GPU
|
|
|
|
* as wedged. But we only want to do this when the GPU is angry,
|
2014-04-09 16:19:42 +08:00
|
|
|
* for all other failure, such as an allocation failure, bail.
|
|
|
|
*/
|
2019-07-13 03:29:53 +08:00
|
|
|
if (!intel_gt_is_wedged(&dev_priv->gt)) {
|
2019-07-12 19:24:27 +08:00
|
|
|
i915_probe_error(dev_priv,
|
|
|
|
"Failed to initialize GPU, declaring it wedged!\n");
|
2019-07-13 03:29:53 +08:00
|
|
|
intel_gt_set_wedged(&dev_priv->gt);
|
2017-10-15 22:37:25 +08:00
|
|
|
}
|
2018-07-26 16:50:32 +08:00
|
|
|
|
|
|
|
/* Minimal basic recovery for KMS */
|
|
|
|
ret = i915_ggtt_enable_hw(dev_priv);
|
|
|
|
i915_gem_restore_gtt_mappings(dev_priv);
|
2019-10-16 22:32:33 +08:00
|
|
|
i915_gem_restore_fences(&dev_priv->ggtt);
|
2018-07-26 16:50:32 +08:00
|
|
|
intel_init_clock_gating(dev_priv);
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
|
|
|
|
2017-12-13 21:43:47 +08:00
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
2014-04-09 16:19:42 +08:00
|
|
|
return ret;
|
2012-04-24 22:47:41 +08:00
|
|
|
}
|
|
|
|
|
2019-08-06 20:42:59 +08:00
|
|
|
void i915_gem_driver_register(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
i915_gem_driver_register__shrinker(i915);
|
2019-08-06 20:43:00 +08:00
|
|
|
|
|
|
|
intel_engines_driver_register(i915);
|
2019-08-06 20:42:59 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void i915_gem_driver_unregister(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
i915_gem_driver_unregister__shrinker(i915);
|
|
|
|
}
|
|
|
|
|
2019-07-12 19:24:29 +08:00
|
|
|
void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
|
2018-06-04 17:00:32 +08:00
|
|
|
{
|
2019-06-13 15:32:54 +08:00
|
|
|
intel_wakeref_auto_fini(&dev_priv->ggtt.userfault_wakeref);
|
2019-05-27 19:51:14 +08:00
|
|
|
|
2018-06-04 17:00:32 +08:00
|
|
|
i915_gem_suspend_late(dev_priv);
|
2019-09-05 19:14:03 +08:00
|
|
|
intel_gt_driver_remove(&dev_priv->gt);
|
2018-06-04 17:00:32 +08:00
|
|
|
|
|
|
|
/* Flush any outstanding unpin_work. */
|
|
|
|
i915_gem_drain_workqueue(dev_priv);
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_fini_hw(&dev_priv->gt.uc);
|
|
|
|
intel_uc_fini(&dev_priv->gt.uc);
|
2019-05-30 21:31:05 +08:00
|
|
|
|
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
|
|
|
}
|
|
|
|
|
2019-07-12 19:24:28 +08:00
|
|
|
void i915_gem_driver_release(struct drm_i915_private *dev_priv)
|
2019-05-30 21:31:05 +08:00
|
|
|
{
|
2019-05-01 18:32:04 +08:00
|
|
|
intel_engines_cleanup(dev_priv);
|
2019-10-04 21:40:09 +08:00
|
|
|
i915_gem_driver_release__contexts(dev_priv);
|
2019-09-05 19:14:03 +08:00
|
|
|
intel_gt_driver_release(&dev_priv->gt);
|
2018-06-04 17:00:32 +08:00
|
|
|
|
2018-12-03 21:33:19 +08:00
|
|
|
intel_wa_list_free(&dev_priv->gt_wa_list);
|
|
|
|
|
2019-07-13 18:00:13 +08:00
|
|
|
intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
|
2018-06-04 17:00:32 +08:00
|
|
|
i915_gem_cleanup_userptr(dev_priv);
|
2019-06-21 15:08:10 +08:00
|
|
|
intel_timelines_fini(dev_priv);
|
2018-06-04 17:00:32 +08:00
|
|
|
|
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
|
|
|
|
2019-10-04 21:40:09 +08:00
|
|
|
WARN_ON(!list_empty(&dev_priv->gem.contexts.list));
|
2018-06-04 17:00:32 +08:00
|
|
|
}
|
|
|
|
|
2017-01-24 19:01:35 +08:00
|
|
|
void i915_gem_init_mmio(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
i915_gem_sanitize(i915);
|
|
|
|
}
|
|
|
|
|
2017-11-11 07:24:47 +08:00
|
|
|
static void i915_gem_init__mm(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
spin_lock_init(&i915->mm.obj_lock);
|
|
|
|
|
|
|
|
init_llist_head(&i915->mm.free_list);
|
|
|
|
|
2019-05-31 04:34:59 +08:00
|
|
|
INIT_LIST_HEAD(&i915->mm.purge_list);
|
2019-06-12 18:57:20 +08:00
|
|
|
INIT_LIST_HEAD(&i915->mm.shrink_list);
|
2017-11-11 07:24:47 +08:00
|
|
|
|
2019-05-28 17:29:45 +08:00
|
|
|
i915_gem_init__objects(i915);
|
2017-11-11 07:24:47 +08:00
|
|
|
}
|
|
|
|
|
2019-09-28 01:33:49 +08:00
|
|
|
void i915_gem_init_early(struct drm_i915_private *dev_priv)
|
2008-07-31 03:06:12 +08:00
|
|
|
{
|
2017-11-11 07:24:47 +08:00
|
|
|
i915_gem_init__mm(dev_priv);
|
2019-04-25 04:07:14 +08:00
|
|
|
i915_gem_init__pm(dev_priv);
|
2017-10-16 19:40:37 +08:00
|
|
|
|
2016-08-04 23:32:36 +08:00
|
|
|
spin_lock_init(&dev_priv->fb_tracking.lock);
|
2008-07-31 03:06:12 +08:00
|
|
|
}
|
2008-12-30 18:31:46 +08:00
|
|
|
|
2018-03-23 20:34:49 +08:00
|
|
|
void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
|
2016-01-19 21:26:29 +08:00
|
|
|
{
|
2017-02-11 00:35:23 +08:00
|
|
|
i915_gem_drain_freed_objects(dev_priv);
|
2018-02-20 06:06:31 +08:00
|
|
|
GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
|
|
|
|
GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
|
2019-05-31 04:35:00 +08:00
|
|
|
WARN_ON(dev_priv->mm.shrink_count);
|
2016-01-19 21:26:29 +08:00
|
|
|
}
|
|
|
|
|
2016-09-21 21:51:07 +08:00
|
|
|
int i915_gem_freeze(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2017-04-07 18:25:49 +08:00
|
|
|
/* Discard all purgeable objects, let userspace recover those as
|
|
|
|
* required after resuming.
|
|
|
|
*/
|
2016-09-21 21:51:07 +08:00
|
|
|
i915_gem_shrink_all(dev_priv);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-06-01 22:41:25 +08:00
|
|
|
int i915_gem_freeze_late(struct drm_i915_private *i915)
|
2016-05-14 14:26:33 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
2019-06-12 18:57:20 +08:00
|
|
|
intel_wakeref_t wakeref;
|
2016-05-14 14:26:33 +08:00
|
|
|
|
2018-06-01 22:41:25 +08:00
|
|
|
/*
|
|
|
|
* Called just before we write the hibernation image.
|
2016-05-14 14:26:33 +08:00
|
|
|
*
|
|
|
|
* We need to update the domain tracking to reflect that the CPU
|
|
|
|
* will be accessing all the pages to create and restore from the
|
|
|
|
* hibernation, and so upon restoration those pages will be in the
|
|
|
|
* CPU domain.
|
|
|
|
*
|
|
|
|
* To make sure the hibernation image contains the latest state,
|
|
|
|
* we update that state just before writing out the image.
|
2016-09-10 03:02:18 +08:00
|
|
|
*
|
|
|
|
* To try and reduce the hibernation image, we manually shrink
|
2017-04-07 18:25:49 +08:00
|
|
|
* the objects as well, see i915_gem_freeze()
|
2016-05-14 14:26:33 +08:00
|
|
|
*/
|
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
wakeref = intel_runtime_pm_get(&i915->runtime_pm);
|
2019-06-12 18:57:20 +08:00
|
|
|
|
|
|
|
i915_gem_shrink(i915, -1UL, NULL, ~0);
|
2018-06-01 22:41:25 +08:00
|
|
|
i915_gem_drain_freed_objects(i915);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
2019-06-12 18:57:20 +08:00
|
|
|
list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) {
|
|
|
|
i915_gem_object_lock(obj);
|
|
|
|
WARN_ON(i915_gem_object_set_to_cpu_domain(obj, true));
|
|
|
|
i915_gem_object_unlock(obj);
|
2016-05-14 14:26:33 +08:00
|
|
|
}
|
2019-06-12 18:57:20 +08:00
|
|
|
|
2019-06-14 07:21:54 +08:00
|
|
|
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
|
2016-05-14 14:26:33 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-24 23:02:42 +08:00
|
|
|
void i915_gem_release(struct drm_device *dev, struct drm_file *file)
|
2009-06-03 15:27:35 +08:00
|
|
|
{
|
2010-09-24 23:02:42 +08:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2018-02-21 17:56:36 +08:00
|
|
|
struct i915_request *request;
|
2009-06-03 15:27:35 +08:00
|
|
|
|
|
|
|
/* Clean up our request list when the client is going away, so that
|
|
|
|
* later retire_requests won't dereference our soon-to-be-gone
|
|
|
|
* file_priv.
|
|
|
|
*/
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2017-03-02 20:25:25 +08:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_link)
|
2010-09-24 23:02:42 +08:00
|
|
|
request->file_priv = NULL;
|
2010-09-26 18:03:27 +08:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
2017-06-20 19:05:45 +08:00
|
|
|
int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
{
|
|
|
|
struct drm_i915_file_private *file_priv;
|
2013-12-07 06:10:58 +08:00
|
|
|
int ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2016-11-09 18:45:07 +08:00
|
|
|
DRM_DEBUG("\n");
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
|
|
|
|
if (!file_priv)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
file->driver_priv = file_priv;
|
2017-06-20 19:05:45 +08:00
|
|
|
file_priv->dev_priv = i915;
|
2014-02-25 23:11:24 +08:00
|
|
|
file_priv->file = file;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
|
|
|
spin_lock_init(&file_priv->mm.lock);
|
|
|
|
INIT_LIST_HEAD(&file_priv->mm.request_list);
|
|
|
|
|
2016-07-27 16:07:27 +08:00
|
|
|
file_priv->bsd_engine = -1;
|
2018-06-15 18:44:29 +08:00
|
|
|
file_priv->hang_timestamp = jiffies;
|
2016-01-15 23:12:50 +08:00
|
|
|
|
2017-06-20 19:05:45 +08:00
|
|
|
ret = i915_gem_context_open(i915, file);
|
2013-12-07 06:10:58 +08:00
|
|
|
if (ret)
|
|
|
|
kfree(file_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
|
2013-12-07 06:10:58 +08:00
|
|
|
return ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-26 00:34:56 +08:00
|
|
|
}
|
|
|
|
|
2017-02-14 01:15:13 +08:00
|
|
|
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
|
2017-02-14 01:15:17 +08:00
|
|
|
#include "selftests/mock_gem_device.c"
|
2018-08-30 21:48:06 +08:00
|
|
|
#include "selftests/i915_gem.c"
|
2017-02-14 01:15:13 +08:00
|
|
|
#endif
|