We are now able to configure a pipeline directly into a local display
list body. Take advantage of this fact, and create a cacheable body to
store the configuration of the pipeline in the pipeline object.
vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
Convert this function to use the cached pipe->stream_config body and
obtain a local display list reference.
Attach the pipe->stream_config body to the display list when needed
before committing to hardware.
Use a flag 'configured' to know when we should attach our stream_config
to the next outgoing display list to reconfigure the hardware in the
event of our first frame, or the first frame following a suspend/resume
cycle.
Our video DL usage now looks like the below output:
dl->body0 contains our disposable runtime configuration. Max 41.
dl_child->body0 is our partition specific configuration. Max 12.
dl->bodies shows our constant configuration and LUTs.
These two are LUT/CLU:
* dl->bodies[x]->num_entries 256 / max 256
* dl->bodies[x]->num_entries 4914 / max 4914
Which shows that our 'constant' configuration cache is currently
utilised to a maximum of 64 entries.
trace-cmd report | \
dl->body0->num_entries 13 / max 128
dl->body0->num_entries 14 / max 128
dl->body0->num_entries 16 / max 128
dl->body0->num_entries 20 / max 128
dl->body0->num_entries 27 / max 128
dl->body0->num_entries 34 / max 128
dl->body0->num_entries 41 / max 128
dl_child->body0->num_entries 10 / max 128
dl_child->body0->num_entries 12 / max 128
dl->bodies[x]->num_entries 15 / max 128
dl->bodies[x]->num_entries 16 / max 128
dl->bodies[x]->num_entries 17 / max 128
dl->bodies[x]->num_entries 18 / max 128
dl->bodies[x]->num_entries 20 / max 128
dl->bodies[x]->num_entries 21 / max 128
dl->bodies[x]->num_entries 256 / max 256
dl->bodies[x]->num_entries 31 / max 128
dl->bodies[x]->num_entries 32 / max 128
dl->bodies[x]->num_entries 39 / max 128
dl->bodies[x]->num_entries 40 / max 128
dl->bodies[x]->num_entries 47 / max 128
dl->bodies[x]->num_entries 48 / max 128
dl->bodies[x]->num_entries 4914 / max 4914
dl->bodies[x]->num_entries 55 / max 128
dl->bodies[x]->num_entries 56 / max 128
dl->bodies[x]->num_entries 63 / max 128
dl->bodies[x]->num_entries 64 / max 128
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Currently the entities store their configurations into a display list.
Adapt this such that the code can be configured into a body directly,
allowing greater flexibility and control of the content.
All users of vsp1_dl_list_write() are removed in this process, thus it
too is removed.
A helper, vsp1_dl_list_get_body0() is provided to access the internal body0
from the display list.
[laurent.pinchart+renesas@ideasonboard.com: Don't remove blank line unnecessarily]
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Extend the display list body with a reference count, allowing bodies to
be kept as long as a reference is maintained. This provides the ability
to keep a cached copy of bodies which will not change, so that they can
be re-applied to multiple display lists.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Adapt the dl->body0 object to use an object from the body pool. This
greatly reduces the pressure on the TLB for IPMMU use cases, as all of
the lists use a single allocation for the main body.
The CLU and LUT objects pre-allocate a pool containing three bodies,
allowing a userspace update before the hardware has committed a previous
set of tables.
Bodies are no longer 'freed' in interrupt context, but instead released
back to their respective pools. This allows us to remove the garbage
collector in the DLM.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Each display list allocates a body to store register values in a dma
accessible buffer from a dma_alloc_wc() allocation. Each of these
results in an entry in the IOMMU TLB, and a large number of display list
allocations adds pressure to this resource.
Reduce TLB pressure on the IPMMUs by allocating multiple display list
bodies in a single allocation, and providing these to the display list
through a 'body pool'. A pool can be allocated by the display list
manager or entities which require their own body allocations.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The body write function relies on the code never asking it to write more
than the entries available in the list.
Currently with each list body containing 256 entries, this is fine, but
we can reduce this number greatly saving memory. In preparation of this
add a level of protection to catch any buffer overflows.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Throughout the codebase, the term 'fragment' is used to represent a
display list body. This term duplicates the 'body' which is already in
use.
The datasheet references these objects as a body, therefore replace all
mentions of a fragment with a body, along with the corresponding
pluralised terms.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Adopt the SPDX license identifier headers to ease license compliance
management. All files in the driver are licensed under the GPLv2+ except
for the vsp1_regs.h file which is licensed under the GPLv2. This is
likely an oversight, but fixing this requires contacting the copyright
owners and is out of scope for this patch.
While at it fix the file descriptions to match file names where copy and
paste error occurred.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Display list completion is already reported to the frame end handler,
but that mechanism is global to all display lists. In order to implement
BRU and BRS reassignment in DRM pipelines we will need to commit a
display list and wait for its completion internally, without reporting
it to the DRM driver. Extend the display list API to support such an
internal use of the display list.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
We will soon need to return more than a boolean completion status from
the vsp1_dlm_irq_frame_end() IRQ handler. Turn the return value into a
bitfield to prepare for that. No functional change is introduced here.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
To allow dual pipelines utilising two WPF entities when available, the
VSP was updated to support header-mode display list in continuous
pipelines.
A small bug in the status check of the command register causes the
second pipeline to be directly afflicted by the running of the first;
appearing as a perceived performance issue with stuttering display.
Fix the vsp1_dl_list_hw_update_pending() call to ensure that the read
comparison corresponds to the correct pipeline.
Fixes: eaf4bfad6a ("v4l: vsp1: Add support for header display lists in continuous mode")
Cc: "Stable v4.14+" <stable@vger.kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Fix this warning:
drivers/media/platform/vsp1/vsp1_dl.c:87: warning: No description found for parameter 'has_chain'
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The VSP supports both header and headerless display lists. The latter is
easier to use when the VSP feeds data directly to the DU in continuous
mode, and the driver thus uses headerless display lists for DU operation
and header display lists otherwise.
Headerless display lists are only available on WPF.0. This has never
been an issue so far, as only WPF.0 is connected to the DU. However, on
H3 ES2.0, the VSP-DL instance has both WPF.0 and WPF.1 connected to the
DU. We thus can't use headerless display lists unconditionally for DU
operation.
Implement support for continuous mode with header display lists, and use
it for DU operation on WPF outputs that don't support headerless mode.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
When the display start interrupt occurs, we know that the hardware has
finished loading the active display list. The driver then proceeds to
recycle the list, assuming it won't be needed anymore.
This assumption holds true for headerless display lists, as the VSP
doesn't reload the list for the next frame if it hasn't changed.
However, this isn't true anymore for header display lists, as they are
loaded at every frame start regardless of whether they have been
updated.
To prepare for header display lists usage in display pipelines, we need
to postpone recycling the list until it gets replaced by a new one
through a page flip. The driver already does so in the frame end
interrupt handler, so all we need is to skip list recycling in the
display start interrupt handler.
While the active list can be recycled at display start for headerless
display lists, there's no real harm in postponing that to the frame end
interrupt handler in all cases. This simplifies interrupt handling as we
don't need to process the display start interrupt anymore.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The display list headers are filled using information from the display
list only. Lower the display list manager spinlock contention by filling
the headers without holding the lock.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
On Gen2 hardware the VSP1 is a bus master and accesses the display list
and video buffers through DMA directly. On Gen3 hardware, however,
memory accesses go through a separate IP core called FCP.
The VSP1 driver unconditionally maps DMA buffers through the VSP device.
While this doesn't cause any practical issue so far, DMA mappings will
be incorrect as soon as we will enable IOMMU support for the FCP on Gen3
platforms, resulting in IOMMU faults.
Fix this by mapping all buffers through the FCP device if present, and
through the VSP1 device as usual otherwise.
Suggested-by: Magnus Damm <magnus.damm@gmail.com>
[Cache the bus master device]
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Acked-by: Mauro Cavalho Chehab <mchehab@s-opensource.com>
If we try to commit the display list while an update is pending, we have
missed our opportunity. The display list manager will hold the commit
until the next interrupt.
In this event, we skip the pipeline completion callback handler so that
the pipeline will not mistakenly report frame completion to the user.
Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Fix all multi-line comments to comply with the kernel coding style.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Several multi-line comments added at the vsp1 patch series
violate the Kernel CodingStyle. Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
When display lists are linked in a chain, they will be processed
automatically by the hardware, with each list linking to the next. Only
on the last display list will the frame end interrupt be fired to mark
the completion event.
Upon frame-end, the chain will be iterated to release each display list
back to the free list.
The chained lists use case (image partitioning) can require up to 64
lists per frame in the worst case scenario, bump up the number of
preallocated lists.
Signed-off-by: Kieran Bingham <kieran+renesas@bingham.xyz>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Freeing a fragment requires freeing DMA coherent memory, which can be
performed with interrupts disabled as per the DMA mapping API contract.
The fragments can't thus be freed synchronously when a display list is
recycled. Instead, move the fragments to a garbage list and use a work
queue to run the garbage collection.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Display lists support up to 8 bodies but we currently use a single one.
To support preparing display lists for large look-up tables, add support
for multi-body display lists.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The vsp1_dl_list_put() function expects to be called with the display
list manager lock held. This assumption is correct for calls from within
the vsp1_dl.c file, but not for the external calls. Fix it by taking the
lock inside the function and providing an unlocked version for the
internal callers.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
The field takes positive values only, make it unsigned.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Don't restrict display list usage to the DRM pipeline, use them
unconditionally. This prepares the driver to support the request API.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Display lists can operate in header or headerless mode. The headerless
mode is only available on WPF0, to be used with the display engine. All
other WPF instances can only use display lists in header mode.
Implement support for header mode to prepare for display list usage on
WPFs other than 0.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Each WPF can process display lists independently, move the manager to
the WPF to reflect that and prepare for display list support for non-DRM
pipelines.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
This clarifies the API and prepares display list support for being used
to implement the request API.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Make sure display list usage is correctly disabled by always setting up
the corresponding registers, including when the display list feature
isn't used.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
I noticed this while merging the drm tree and checking for stragglers:
the vsp1 driver still used dma_[alloc|free]_writecombine() that got
renamed in commit f6e45661f9 ("dma, mm/pat: Rename
dma_*_writecombine() to dma_*_wc()")
I should have noticed back in the media merge (commit bace3db5da), but
better late than never.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Display lists contain lists of registers and associated values to be
applied atomically by the hardware. They lower the pressure on interrupt
processing delays when reprogramming the device as settings can be
prepared well in advance and queued to the hardware without waiting for
the end of the current frame.
Display list support is currently limited to the DRM pipeline.
Signed-off-by: Koji Matsuoka <koji.matsuoka.xm@renesas.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>