A couple of operations, the flushes and the tracepoint, do not require
serialisation by client->wq_lock, so move them before we take it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170228112803.11646-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Following the use of dma_fence_signal() from within our interrupt
handler, we need to make guc->wq_lock also irq-safe. This was done
previously as part of the guc scheduler patch (which also started
mixing our fences with the interrupt handler), but is now required to
fix the current guc submission backend.
v4: Document that __i915_guc_submit is always under an irq disabled
section
v5: Move wq_rsvd adjustment to its own function
Fixes: 67b807a892 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170228112803.11646-2-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Two new tracepoints placed at the call sites where requests are
actually passed to the GPU enable userspace to track engine
utilisation.
These tracepoints are only enabled when the
DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option is enabled.
v2: Fix compilation with !CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS.
v3: Name global seqno consistently across tracepoints.
v4: Remove port info from request out tracepoint. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With the introduce of i915_vma_instance() for obtaining the VMA
singleton for a (obj, vm, view) tuple, we can remove the
i915_vma_create() in favour of a single entry point. We do incur a
lookup onto an empty tree, but the i915_vma_create() were being called
infrequently and during initialisation, so the small overhead is
negligible.
v2: Drop the i915_ prefix from the now static vma_create() function
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk
Rename some of the GuC fw loading code to make them more general. We
will utilise them for HuC loading as well.
s/intel_guc_fw/intel_uc_fw/g
s/GUC_FIRMWARE/INTEL_UC_FIRMWARE/g
Struct intel_guc_fw is renamed to intel_uc_fw. Prefix of tts members,
such as 'guc' or 'guc_fw' either is renamed to 'uc' or removed for
same purpose.
v2: rebased on top of nightly.
reapplied the search/replace as upstream code as changed.
v3: removed G from messages in shared fw fetch function.
v4: rebased.Updated dev to dev_priv in intel_guc_setup(), guc_fw_getch()
and intel_guc_init().
v5: rebased. Remove uint32_t fw_type to patch 2. Add INTEL_ prefix for
fields in enum intel_uc_fw_status. Remove uc_dev field since its never
used.Rename uc_fw to just fw and guc_fw to fw to avoid redundency.
v6: rebased. Remove sections of code that were commented and no longer
required.
v7: rebased. Remove uc_fw_ prefix from path and obj fields
in intel_uc_fw struct as suggested by Michal.
v8: rebased. Add declaration of intel_guc_wopcm_size() in
this patch instead of patch 3.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484356631-16139-2-git-send-email-anusha.srivatsa@intel.com
Functions supporting GuC logging capabilities were spread across
many files, with unnecessary exposures and mixed with unrelated
code. Dedicate file will make maintenance of all GuC functions
easier as more functions are coming to support GuC submissions.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170113174157.104492-1-michal.wajdeczko@intel.com
Move the GuC invalidation of its ggtt TLB to where we perform the ggtt
modification rather than proliferate it into all the callers of the
insert (which may or may not in fact have to do the insertion).
v2: Just do the guc invalidate unconditionally, (afaict) it has no impact
without the guc loaded on gen8+
v3: Conditionally invalidate the guc - just in case that register has
not been validated for other modes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk
In order to silence sparse:
../drivers/gpu/drm/i915/i915_gpu_error.c:200:39: warning: Using plain integer as NULL pointer
add a helper to check whether we have sse4.1 and that the desired
alignment is valid for acceleration.
v2: Explain the macros and split the two use cases between
i915_has_memcpy_from_wc() and i915_can_memcpy_from_wc().
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-1-chris@chris-wilson.co.uk
Add an assertion to the plain i915_ggtt_offset() to double check that
any offset we hand to the GuC is outside of its unmappable ranges.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161224193146.4402-1-chris@chris-wilson.co.uk
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.
We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.
We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.
And finally an ulterior motive for unifying context handling was to
prepare for mock requests.
v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
We usually use 'client' as identifier for the i915_guc_client.
For unknown reason, few functions were using 'gc' name.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[joonas.lahtinen@linux.intel.com: Split two lines over 80]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161215195321.63804-1-michal.wajdeczko@intel.com
Like GEM init, GUC init, MOCS init and context creation.
Enables them to lose dev_priv locals.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Makes all GEM object constructors consistent.
v2: Fix compilation in GVT code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
In order to avoid some complexity in trying to reconstruct the
workqueues across reset, remember them instead. The issue comes when we
have to handle a reset between request allocation and submission, the
request has reserved space in the wq, but is not in any list so we fail
to restore the reserved space. By keeping the execbuf client intact
across the reset, we also keep the reservations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129121024.22650-5-chris@chris-wilson.co.uk
send_mutex is used to serialise communication with GuC via
intel_guc_send().
Since functions that utilize it are no longer limited to submission,
initialization should be handled as a part of general setup.
v2: move initialization to *_early()
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480096777-12573-5-git-send-email-arkadiusz.hiler@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
guc_send(), guc_recv() and related functions were introduced in the
i915_guc_submission.c and their scope was limited only to that file.
Those are not submission specific though.
This patch moves moves them to intel_uc.c with intel_ prefix added.
v2: rename intel_guc_log_* functions and clean up intel_guc_send usages
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480096777-12573-4-git-send-email-arkadiusz.hiler@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To facilitate code reorganization we are renaming everything that
contains guc2host or host2guc.
host2guc_action() and host2guc_action_response() become guc_send()
and guc_recv() respectively.
Other host2guc_*() functions become simply guc_*().
Other entities are renamed basing on context they appear in:
- HOST2GUC_ACTIONS_& become INTEL_GUC_ACTION_*
- HOST2GUC_{INTERRUPT,TRIGGER} become GUC_SEND_{INTERRUPT,TRIGGER}
- GUC2HOST_STATUS_* become INTEL_GUC_STATUS_*
- GUC2HOST_MSG_* become INTEL_GUC_RECV_MSG_*
- action_lock becomes send_mutex
v2: drop unnecessary backslashes and use BIT() instead of '<<'
v3: shortened enum names and INTEL_GUC_STATUS_*
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1480096777-12573-3-git-send-email-arkadiusz.hiler@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
GuC is not the only one micro controller we have.
There are also HuC and DMC.
Making the file more general will help with code organization.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1480096777-12573-2-git-send-email-arkadiusz.hiler@intel.com
Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.
The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.
When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.
One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.
v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
Defer the transfer from the client's timeline onto the execution
timeline from the point of readiness to the point of actual submission.
For example, in execlists, a request is finally submitted to hardware
when the hardware is ready, and only put onto the hardware queue when
the request is ready. By deferring the transfer, we ensure that the
timeline is maintained in retirement order if we decide to queue the
requests onto the hardware in a different order than fifo.
v2: Rebased onto distinct global/user timeline lock classes.
v3: Play with the position of the spin_lock().
v4: Nesting finally resolved with distinct sw_fence lock classes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.
Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
Driver accesses the ringbuffer pages, via GMADR BAR, if the pages are
pinned in mappable aperture portion of GGTT and for ringbuffer pages
allocated from Stolen memory, access can only be done through GMADR BAR.
In case of GuC based submission, updates done in ringbuffer via GMADR
may not get committed to memory by the time the Command streamer starts
reading them, resulting in fetching of stale data.
For Host based submission, such problem is not there as the write to Ring
Tail or ELSP register happens from the Host side prior to submission.
Access to any GFX register from CPU side goes to GTTMMADR BAR and Hw already
enforces the ordering between outstanding GMADR writes & new GTTMADR access.
MMIO writes from GuC side do not go to GTTMMADR BAR as GuC communication to
registers within GT is contained within GT, so ordering is not enforced
resulting in a race, which can manifest in form of a hang.
To ensure the flush of in-flight GMADR writes, a POSTING READ is done to
GuC register prior to doorbell ring.
There is already a similar WA in i915_gem_object_flush_gtt_write_domain(),
which takes care of GMADR writes from User space to GEM buffers, but not the
ringbuffer writes from KMD.
This WA is needed on all recent HW.
v2:
- Use POSTING_READ_FW instead of POSTING_READ as GuC register do not lie
in any forcewake domain range and so the overhead of spinlock & search
in the forcewake table is avoidable. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413323-1880-1-git-send-email-akash.goel@intel.com
The GuC log buffer flush work item has to do a register access to send the
ack to GuC and this work item, if not synced before suspend, can potentially
get executed after the GFX device is suspended. This work item function uses
rpm get/put calls around the Hw access, which covers the rpm suspend case
but for system suspend a sync would be required as kernel can potentially
schedule the work items even after some devices, including GFX, have been
put to suspend. But sync has to be done only for the system suspend case,
as sync along with rpm get/put can cause a deadlock for rpm suspend path.
To have the sync, but like a NOOP, for rpm suspend path also this work
item could have been queued from the irq handler only when the device is
runtime active & kept active while that work item is pending or getting
executed but an interrupt can come even after the device is out of use and
so can potentially lead to missing of this work item.
By marking the workqueue, dedicated for handling GuC log buffer flush
interrupts, as freezable we don't have to bother about flushing of this
work item from the suspend hooks, the pending work item if any will be
either executed before the suspend or scheduled later on resume. This way
the handling of log buffer flush work item can be kept same between system
suspend & rpm suspend.
Suggested-by: Imre Deak <imre.deak@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
As per the current i915 Driver load sequence, debugfs registration is done
at the end and so the relay channel debugfs file is also created after that
but the GuC firmware is loaded much earlier in the sequence.
As a result Driver could miss capturing the boot-time logs of GuC firmware
if there are flush interrupts from the GuC side.
Relay has a provision to support early logging where initially only relay
channel can be created, to have buffers for storing logs, and later on
channel can be associated with a debugfs file at appropriate time.
Have availed that, which allows Driver to capture boot time logs also,
which can be collected once Userspace comes up.
v2:
- Remove the couple of FIXMEs, as now the relay channel will be created
early before enabling the flush interrupts, so no possibility of relay
channel pointer being modified & read at the same time from 2 different
execution contexts.
- Rebase.
v3:
- Add a comment to justiy setting 'is_global' before the NULL check on the
parent directory dentry pointer.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
To ensure that we always get the up-to-date data from log buffer, its
better to access the buffer through an uncached CPU mapping. Also the way
buffer is accessed from GuC & Host side, manually doing cache flush may
not be effective always if cached CPU mapping is used. In order to avoid
any performance drop & have fast reads from the GuC log buffer, used SSE4.1
movntdqa based memcpy function i915_memcpy_from_wc, as copying using
movntqda from WC type memory is almost as fast as reading from WB memory.
This way log buffer sampling time will not get increased and so would be
able to deal with the flush interrupt storm when GuC is generating logs at
a very high rate.
Ideally SSE 4.1 should be present on all chipsets supporting GuC based
submisssions, but if not then logging will not be enabled.
v2: Rebase.
v3: Squash the WC type vmalloc mapping patch with this patch. (Chris)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This patch provides debugfs interface i915_guc_output_control for
on the fly enabling/disabling of logging in GuC firmware and controlling
the verbosity level of logs.
The value written to the file, should have bit 0 set to enable logging and
bits 4-7 should contain the verbosity info.
v2: Add a forceful flush, to collect left over logs, on disabling logging.
Useful for Validation.
v3: Besides minor cleanup, implement read method for the debugfs file and
set the guc_log_level to -1 when logging is disabled. (Tvrtko)
v4: Minor cleanup & rebase. (Tvrtko)
v5:
- Lock struct_mutex after the NULL check for guc log buffer vma. (Chris)
- Rebase.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends a flush interrupt to Host when the log buffer is half
full and at that time only it updates the log buffer state.
But in certain cases, as described below, it could be useful to have all
that even when log buffer is only partially full. For that there is a force
log buffer flush Host2GuC action supported by GuC firmware.
For Validation requirements, a forceful flush is needed to collect the
left over logs on disabling logging. The same can be done before proceeding
with GPU/GuC reset as there could be some data in log buffer which is yet
to be captured and those logs would be particularly useful to understand
that why the reset was initiated.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends an interrupt to flush the log buffer when it becomes
half full, so Driver doesn't really need to sample the complete buffer
and can just copy only the newly written data by GuC into the local
buffer, i.e. as per the read & write pointer values.
Moreover the flush interrupt would generally come for one type of log
buffer, when it becomes half full, so at that time the other 2 types of
log buffer would comparatively have much lesser unread data in them.
In case of overflow reported by GuC, Driver do need to copy the entire
buffer as the whole buffer would contain the unread data.
v2: Rebase.
v3: Fix the blooper of doing the copy twice. (Tvrtko)
v4: Add curlies for 'else' case also, matching the 'if'. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends an interrupt to flush the log buffer when it
becomes half full. GuC firmware also tracks how many times the
buffer overflowed.
It would be useful to maintain a statistics of how many flush
interrupts were received and for which type of log buffer,
along with the overflow count of each buffer type.
Augmented i915_log_info debugfs to report back these statistics.
v2:
- Update the logic to detect multiple overflows between the 2
flush interrupts and also log a message for overflow (Tvrtko)
- Track the number of times there was no free sub buffer to capture
the GuC log buffer. (Tvrtko)
v3:
- Fix the printf field width for overflow counter, set it to 10 as per the
max value of u32, which takes 10 digits in decimal form. (Tvrtko)
v4:
- Move the log buffer overflow handling to a new function for better
readability. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
With the addition of new Host2GuC actions related to GuC logging, there
is a need of a lock to serialize them, as they can execute concurrently
with each other and also with other existing actions.
v2: Use mutex in place of spinlock to serialize, as sleep can happen
while waiting for the action's response from GuC. (Tvrtko)
v3: To conform to the general rules, acquire mutex before taking the
forcewake. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Added a new debugfs interface '/sys/kernel/debug/dri/guc_log' for the
User to capture GuC firmware logs. Availed relay framework to implement
the interface, where Driver will have to just use a relay API to store
snapshots of the GuC log buffer in the buffer managed by relay.
The snapshot will be taken when GuC firmware sends a log buffer flush
interrupt and up to four snapshots could be stored in the relay buffer.
The relay buffer will be operated in a mode where it will overwrite the
data not yet collected by User.
Besides mmap method, through which User can directly access the relay
buffer contents, relay also supports the 'poll' method. Through the 'poll'
call on log file, User can come to know whenever a new snapshot of the
log buffer is taken by Driver, so can run in tandem with the Driver and
capture the logs in a sustained/streaming manner, without any loss of data.
v2: Defer the creation of relay channel & associated debugfs file, as
debugfs setup is now done at the end of i915 Driver load. (Chris)
v3:
- Switch to no-overwrite mode for relay.
- Fix the relay sub buffer switching sequence.
v4:
- Update i915 Kconfig to select RELAY config. (TvrtKo)
- Log a message when there is no sub buffer available to capture
the GuC log buffer. (Tvrtko)
- Increase the number of relay sub buffers to 8 from 4, to have
sufficient buffering for boot time logs
v5:
- Fix the alignment, indentation issues and some minor cleanup. (Tvrtko)
- Update the comment to elaborate on why a relay channel has to be
associated with the debugfs file. (Tvrtko)
v6:
- Move the write to 'is_global' after the NULL check on parent directory
dentry pointer. (Tvrtko)
v7: Add a BUG_ON to validate relay buffer allocation size. (Chris)
Testcase: igt/tools/intel_guc_logger
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC ukernel sends an interrupt to Host to flush the log buffer
and expects Host to correspondingly update the read pointer
information in the state structure, once it has consumed the
log buffer contents by copying them to a file or buffer.
Even if Host couldn't copy the contents, it can still update the
read pointer so that logging state is not disturbed on GuC side.
v2:
- Use a dedicated workqueue for handling flush interrupt. (Tvrtko)
- Reduce the overall log buffer copying time by skipping the copy of
crash buffer area for regular cases and copying only the state
structure data in first page.
v3:
- Create a vmalloc mapping of log buffer. (Chris)
- Cover the flush acknowledgment under rpm get & put.(Chris)
- Revert the change of skipping the copy of crash dump area, as
not really needed, will be covered by subsequent patch.
v4:
- Destroy the wq under the same condition in which it was created,
pass dev_piv pointer instead of dev to newly added GuC function,
add more comments & rename variable for clarity. (Tvrtko)
v5:
- Allocate & destroy the dedicated wq, for handling flush interrupt,
from the setup/teardown routines of GuC logging. (Chris)
- Validate the log buffer size value retrieved from state structure
and do some minor cleanup. (Tvrtko)
- Fix error/warnings reported by checkpatch. (Tvrtko)
- Rebase.
v6:
- Remove the interrupts_enabled check from guc_capture_logs_work, need
to process that last work item also, queued just before disabling the
interrupt as log buffer flush interrupt handling is a bit different
case where GuC is actually expecting an ACK from host, which should be
provided to keep the logging going.
Sync against the work will be done by caller disabling the interrupt.
- Don't sample the log buffer size value from state structure, directly
use the expected value to move the pointer & do the copy and that cannot
go wrong (out of bounds) as Driver only allocated the log buffer and the
relay buffers. Driver should refrain from interpreting the log packet,
as much possible and let Userspace parser detect the anomaly. (Chris)
v7:
- Use switch statement instead of 'if else' for retrieving the GuC log
buffer size. (Tvrtko)
- Refactored the log buffer copying function and shortended the name of
couple of variables for better readability. (Tvrtko)
v8:
- Make the dedicated wq as a high priority one to further reduce the
turnaround time of handing log buffer flush event from GuC.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There are certain types of interrupts which Host can receive from GuC.
GuC ukernel sends an interrupt to Host for certain events, like for
example retrieve/consume the logs generated by ukernel.
This patch adds support to receive interrupts from GuC but currently
enables & partially handles only the interrupt sent by GuC ukernel.
Future patches will add support for handling other interrupt types.
v2:
- Use common low level routines for PM IER/IIR programming (Chris)
- Rename interrupt functions to gen9_xxx from gen8_xxx (Chris)
- Replace disabling of wake ref asserts with rpm get/put (Chris)
v3:
- Update comments for more clarity. (Tvrtko)
- Remove the masking of GuC interrupt, which was kept masked till the
start of bottom half, its not really needed as there is only a
single instance of work item & wq is ordered. (Tvrtko)
v4:
- Rebase.
- Rename guc_events to pm_guc_events so as to be indicative of the
register/control block it is associated with. (Chris)
- Add handling for back to back log buffer flush interrupts.
v5:
- Move the read & clearing of register, containing Guc2Host message
bits, outside the irq spinlock. (Tvrtko)
v6:
- Move the log buffer flush interrupt related stuff to the following
patch so as to do only generic bits in this patch. (Tvrtko)
- Rebase.
v7:
- Remove the interrupts_enabled check from gen9_guc_irq_handler, want to
process that last interrupt also before disabling the interrupt, sync
against the work queued by irq handler will be done by caller disabling
the interrupt.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
So far there were 2 fields related to GuC logs in 'intel_guc' structure.
For the support of capturing GuC logs & storing them in a local buffer,
multiple new fields would have to be added. This warrants a separate
structure to contain the fields related to GuC logging state.
Added a new structure 'intel_guc_log' and instance of it inside
'intel_guc' structure.
v2: Rebase.
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC Log buffer allocation was tied up with verbosity level module param
i915.guc_log_level. User would be given a provision to enable firmware
logging at runtime, through a host2guc action, and not necessarily during
Driver load time. But the address of log buffer can be passed only in
init params, at firmware load time, so GuC has to be reset and firmware
needs to be reloaded to pass the log buffer address at runtime.
To avoid reset of GuC & reload of firmware, allocation of log buffer will
be done always but logging would be enabled initially on GuC side based on
the value of module parameter guc_log_level.
v2: Update commit message to describe the constraint with allocation of
log buffer at runtime. (Tvrtko)
v3: Rebase.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
Currently the presumption is that the request construction and its
submission to the GuC are all under the same holding of struct_mutex. We
wish to relax this to separate the request construction and the later
submission to the GuC. This requires us to reserve some space in the
GuC command queue for the future submission. For flexibility to handle
out-of-order request submission we do not preallocate the next slot in
the GuC command queue during request construction, just ensuring that
there is enough space later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-17-chris@chris-wilson.co.uk
Drive final request submission from a callback from the fence. This way
the request is queued until all dependencies are resolved, at which
point it is handed to the backend for queueing to hardware. At this
point, no dependencies are set on the request, so the callback is
immediate.
A side-effect of imposing a heavier-irqsafe spinlock for execlist
submission is that we lose the softirq enabling after scheduling the
execlists tasklet. To compensate, we manually kickstart the softirq by
disabling and enabling the bh around the fence signaling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.
The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.
ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS
We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.
ARB_robustness gives this guide on how we expect the user of this
interface to behave:
* Provide a mechanism for an OpenGL application to learn about
graphics resets that affect the context. When a graphics reset
occurs, the OpenGL context becomes unusable and the application
must create a new context to continue operation. Detecting a
graphics reset happens through an inexpensive query.
And with regards to the actual meaning of the reset values:
Certain events can result in a reset of the GL context. Such a reset
causes all context state to be lost. Recovery from such events
requires recreation of all objects in the affected context. The
current status of the graphics reset state is returned by
enum GetGraphicsResetStatusARB();
The symbolic constant returned indicates if the GL context has been
in a reset state at any point since the last call to
GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
has not been in a reset state since the last call.
GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
that is attributable to the current GL context.
INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
is not attributable to the current GL context.
UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
cause is unknown.
The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.
In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.
v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.
v3: GuC requires replaying the requests after a reset.
v4: Restore engine IRQ after reset (so waiters will be woken!)
Rearm hangcheck if resetting with a waiter.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk
Where we're going to continue regardless of the problem, rather than
fail, then the message should be a WARNing rather than an ERROR.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Rather than walk the full array of engines checking whether each is in
the mask in turn, we can use the mask to jump to the right engines. This
should quicker for a sparse array of engines or mask, whilst generating
smaller code:
text data bss dec hex filename
1251010 4579 800 1256389 132bc5 drivers/gpu/drm/i915/i915.ko
1250530 4579 800 1255909 1329e5 drivers/gpu/drm/i915/i915.ko
The downside is that we have to pass in a temporary, alas no C99
iterators yet.
[P.S. Joonas doesn't like having to pass extra temporaries into the
macro, and even less that I called them tmp. As yet, we haven't found a
macro that avoids passing in a temporary that is smaller. We probably
will get C99 iterators first!]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160827075401.16470-2-chris@chris-wilson.co.uk