This moves the code to generate commands for the ACR unit of the PMU/SEC2 LS
firmwares to those subdevs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Takes the command queue pointer directly instead of requiring a function to
lookup based on an queue type, as well as an explicit timeout value.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Functions implementing FW commands had to implement this themselves, let's
move that to common code and plumb the return code from callbacks through.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
Arbitrary private data passed to callbacks is to allow for something other
than struct nvkm_msgqueue to be passed into the callback (like the pointer
to the subdev itself, for example), and the return code will be used where
we'd like to detect failure from synchronous messages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.
This is an incremental step towards that goal.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
To make things clearer while modifying the interfaces, split msgqueue into
Queue Manager, Command Queue, and Message Queue.
There should be no code changes here, these will be done incrementally.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVDEC is available from GM107, and we currently only have a stub
implementation anyway, let's make it explicit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GP108/GV100 FWs differ from the other GP10x boards.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GP107/GP108 FWs have interface differences.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GM20B/GP10B FWs have interface differences.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
PMU, SEC2 and GR will be modified to register their falcons with ACR before
the main commit switching everything over.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will be used in upcoming commits to allow subdevs to better customise
themselves based on which (if any) firmware is available.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We want to be able to register falcons with ACR during the constructor for
the subdev it belongs to, however, we may not have access to the falcon's
registers prior to DEVINIT.
Delay touching registers until the first time the falcon is acquired.
This may temporarily break secboot on non-production boards due to not
being able to determine whether the falcon is in debug or production mode,
the new ACR subdev will not have this issue, and it's not a use-case that's
terribly important for bisectability.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
drivers/gpu/drm/nouveau/nouveau_ttm.c: In function nouveau_vram_manager_new:
drivers/gpu/drm/nouveau/nouveau_ttm.c:66:22: warning: variable mem set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/nouveau/nouveau_ttm.c: In function nouveau_gart_manager_new:
drivers/gpu/drm/nouveau/nouveau_ttm.c:106:22: warning: variable mem set but not used [-Wunused-but-set-variable]
They are not used any more, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Like other cases, it should use rcu protected 'chan' rather
than 'fence->channel' in nouveau_fence_wait_uevent_handler.
Fixes: 0ec5f02f0e ("drm/nouveau: prevent stale fence->channel pointers, and protect with rcu")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Method init is typically ordered by class in the FW image as ThreeD,
TwoD, Compute.
Due to a bug in parsing the FW into our internal format, we've been
accidentally sending Twod + Compute methods to the ThreeD class, as
well as Compute methods to the TwoD class - oops.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We accidentally set "psb" which is a no-op instead of "*psb" so it
generates a static checker warning. We should probably set it before
the first error return so that it's always initialized.
Fixes: 923f1bd27b ("drm/nouveau/secboot/gm20b: add secure boot support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Turing introduced a new simplified page kind
scheme, reducing the number of possible page
kinds from 256 to 16. It also is the first
NVIDIA GPU in which the highest possible page
kind value is not reserved as an "invalid" page
kind.
To address this, the invalid page kind is made
an explicit property of the MMU HAL, and a new
table of page kinds is added to the tu102 MMU
HAL.
One hardware change not addressed here is that
0x00 is technically no longer a supported page
kind, and pitch surfaces are instead intended to
share the block-linear generic page kind 0x06.
However, because that will be a rather invasive
change to nouveau and 0x00 still works fine in
practice on Turing hardware, addressing this new
behavior is deferred.
Signed-off-by: James Jones <jajones@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The pointer used to walk the table of move ops
and pick the right one for the current GPU was
declared static, meaning its state was carried
over between invocations of the function, and also
made the function non-rentrant and thread-unsafe.
Since the table is ordered such that newer GPU
methods are listed first, the result of this was
that initializing newer GPUs after older GPUs
would result in no suitable ttm move acceleration
operations being found, and ttm would fall back
to CPU blits on the older GPUs.
This change declares the walking pointer
separately from the table and makes it non-static
to fix the logic.
Signed-off-by: James Jones <jajones@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Replace the use of 0 in the pointer assignment with NULL to address the
following sparse warning:
drivers/gpu/drm/nouveau/nouveau_hwmon.c:744:29: warning: Using plain integer as NULL pointer
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The local variable `pclks` is defined and set but not used and can
therefore be removed.
Issue found by coccinelle.
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Explicitly declare constants as unsigned long long to address the
following sparse warnings:
warning: constant is so big it is long
v2: convert to unsigned long long for compatibility with 32-bit
architectures.
Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com>
Suggested by: lia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
match_string() returns the array index of a matching string.
Use it instead of the open-coded implementation.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gp10b uses the new engine enumeration mechanism introduced in the Pascal
architecture. As a result, the copy engine, which used to be at index 2
for prior Tegra GPU instantiations, has now moved to index 0. Fix up the
index and also use the gp100 variant of the copy engine class because on
gp10b the PASCAL_DMA_COPY_B class is not supported.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There are extra registers that need to be programmed to make the level 2
cache work on GP10B, such as the stream ID register that is used when an
SMMU is used to translate memory addresses.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The GPUs found on Tegra SoCs have registers that can be used to read the
WPR configuration. Use these registers instead of reaching into the
memory controller's register space to read the same information.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If the GPU clock has not had a rate set, initialize it to the maximum
clock rate to make sure it does run.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When the GPU powergate is controlled by a generic power domain provider,
the reset will automatically be asserted and deasserted as part of the
power-ungating procedure.
On some Jetson TX2 boards, doing an additional assert and deassert of
the GPU outside of the power-ungate procedure can cause the GPU to go
into a bad state where the memory interface can no longer access system
memory.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When Nouveau is instantiated on top of a platform device, the dev->pdev
field will be NULL and calling pci_disable_device() will crash. Move the
PCI disabling code to the PCI specific driver removal code.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There is no BAR2 on GP10B and there is no need to map through BAR2
because all memory is shared between the GPU and the CPU. Add a custom
implementation of the fault sub-device that uses nvkm_memory_addr()
instead of nvkm_memory_bar2() to return the address of a pinned fault
buffer.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The sizeof is currently on args.src and args.dst and should be on
*args.src and *args.dst. Fortunately these sizes just so happen
to be the same size so it worked, however, this should be fixed
and it also cleans up static analysis warnings
Addresses-Coverity: ("sizeof not portable")
Fixes: f268307ec7 ("nouveau: simplify nouveau_dmem_migrate_vma")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit is seperate from the previous one to make it easier to
revert in the future. Basically, while working on making MSTOs per-head
as opposed to per-head-per-connector I discovered these lovely issues:
https://gitlab.freedesktop.org/xorg/xserver/merge_requests/277https://gitlab.gnome.org/GNOME/mutter/issues/759
Note as well that Intel already has a temporary workaround for this in
their kernel driver. So, unfortunately we need to follow suit to avoid
causing a regression in userspace. Once these issues get fixed, this
commit should be reverted.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Currently, for every single MST capable DRM connector we create a set of
fake encoders, one for each possible head. Unfortunately this ends up
being a huge waste of encoders. While this currently isn't causing us
any problems, it's extremely close to doing so.
The ThinkPad P71 is a good example of this. Originally when trying to
figure out why nouveau was failing to load on this laptop, I discovered
it was because nouveau was creating too many encoders. This ended up
being because we were mistakenly creating MST encoders for the eDP port,
however we are still extremely close to hitting the encoder limit on
this machine as it exposes 1 eDP port and 5 DP ports, resulting in 31
encoders.
So while this fix didn't end up being necessary to fix the P71, we still
need to implement this so that we avoid hitting the encoder limit for
valid display configurations in the event that some machine with more
connectors then this becomes available. Plus, we don't want to let good
code go to waste :)
So, use less encoders by only creating one MSTO per head. Then, attach
each new MSTC to each MSTO which corresponds to a head that it's parent
DP port is capable of using. This brings the number of encoders we
register on the ThinkPad P71 from 31, down to just 15. Yay!
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
When drm_connector_helper_funcs->atomic_best_encoder is defined,
->best_encoder is ignored by the atomic modesetting helpers. That being
said, this hook is completely broken anyway - it always returns the
first msto for a given mstc, despite the fact it might already be in
use.
So, just get rid of it. We'll need this in a moment anyway, when we make
mstos per-head as opposed to per-connector.
Changes since v1:
* Fix typo in documentation - imirkin
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The hardware supports either size. Also add checks to ensure that only
these two sizes may be used for supplying a LUT.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 5fde30a2684041f9820aa9dc4fbd0009a45076a9 in envytools modified
some of the Falcon V5 encodings, regenerate the relevant FW with this.
Also modify build rules to include SPDX header in generated files.
Tested on GM107, with no issues noted.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
[Why]
When change the connection status in a MST topology, mst device
which detect the event will send out CONNECTION_STATUS_NOTIFY messgae.
e.g. src-mst-mst-sst => src-mst (unplug) mst-sst
Currently, under the above case of unplugging device, ports which have
been allocated payloads and are no longer in the topology still occupy
time slots and recorded in proposed_vcpi[] of topology manager.
If we don't clean up the proposed_vcpi[], when code flow goes to try to
update payload table by calling drm_dp_update_payload_part1(), we will
fail at checking port validation due to there are ports with proposed
time slots but no longer in the mst topology. As the result of that, we
will also stop updating the DPCD payload table of down stream port.
[How]
While handling the CONNECTION_STATUS_NOTIFY message, add a detection to
see if the event indicates that a device is unplugged to an output port.
If the detection is true, then iterrate over all proposed_vcpi[] to
see whether a port of the proposed_vcpi[] is still in the topology or
not. If the port is invalid, set its num_slots to 0.
Thereafter, when try to update payload table by calling
drm_dp_update_payload_part1(), we can successfully update the DPCD
payload table of down stream port and clear the proposed_vcpi[] to NULL.
Changes since v1:(https://patchwork.kernel.org/patch/11275801/)
* Invert the conditional to reduce the indenting
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
[removed cc for stable - there's too many patches this depends on for
this to backport cleanly]
Link: https://patchwork.freedesktop.org/patch/msgid/20200106102158.28261-1-Wayne.Lin@amd.com
If driver debugfs files are accessed, power up the GPU
when necessary.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If power management sysfs or debugfs files are accessed,
power up the GPU when necessary.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 4dee6e4ca5 ("drm/amdgpu: use linux size macro to simplify ONE_Kib & One_Mib")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The conversion to bool is not needed, remove it.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
read_current_link_settings_on_detect() on eDP 1.4+ may use the
edp_supported_link_rates table which is set up by
detect_edp_sink_caps(), so that function needs to be called first.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Martin Leung <martin.leung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So that it gets included in the initrd. At the moment
this is optional firmware that contains support for HDCP.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Arcturus, data fabric hashing is set by the VBIOS, and
affects which addresses map to which memory channels. The
gfx core's caches also need to know this mapping, but the
hash settings for these these caches is set by the driver.
This change queries the DF to understand how the VBIOS
configured DF, then matches the TC hash configuration bits
to do the same thing.
v2: squash in warning fix
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On Arcturus, we need TC channel hashing, which is set by the
driver, to match DF hashing, which is set by VBIOS. To match
these, we plan to query the DF information and then properly
set the TC configuration bits to match them.
This patch adds the required fields to register definitions
in preparation for a future patch which will use them.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The only data fabric information the adev struct currently
contains is a function pointer table. In the near future,
we will be adding some cached DF information into adev. As
such, this patch creates a new amdgpu_df struct for adev.
Right now, it only containst the old function pointer table,
but new stuff will be added soon.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Only a single voltage level should be available to Pollock (min level)
Pollock & Dali get misidentified as Renoir, use wrong clk mgr constructor
[HOW]
Add provided Pollock IDs to ASIC Rev. ID list.
Create new Pollock ASIC RID check, fix RV2 & Dali ASIC checks.
Check RID and set max voltage level to 0 if Pollock is detected.
Work around broken ASICREV_IS_RENOIR, IS_RAVEN2, etc. checks by
performing Dali/Pollock checks before they can be misidentified as RN.
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
between UMC RAS err register access restore previous RSMU UMC index mode state
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
in event of GPU reset, XGMI TA unload causes unrecoverable GPU hang
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch updates SMU12_DRIVER_IF_VERSION to 11.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't need to store the pre-OS console memory after
the driver has loaded so free it.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leftover from bring up. We look up the actual pre-OS memory usage
value later in the same function.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It won't get used unless the driver allows the gtt domain for
display buffers which is controlled elsewhere.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It should work on all Raven variants, but some users have
reported issues with original Raven with IOMMU enabled.
So far there have been no issues observed with PCO or RV2.
v2: split out the dm init and domain changes into separate
patches.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It won't get used unless the driver allows the gtt domain for
display buffers which is controlled elsewhere.
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
All of the sdma stuff these were used for moves to
the sdma code, so remove them.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(v2): Fix preprocessor tag
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sdma ras funcs are not supported by ASIC prior
to vega20
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hardcoded offset is not friendly. And another benifit of this
patch is to keep read and write access to this register be
consistent with other similar UMC regsiters in this file.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Both are needed on vega20 and arcturus chip.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cast to make min() happy. The values are well within
range.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
change the sw ctf setting to smu_v11_0_set_thermal_range()
since software_shutdown_temp shares the same definition and
name in all the smu11 project.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0,
as a consequence, the rev_id and external_rev_id are wrong.
workaround it by hardcoding the rev_id to 0, which is the default value.
v2. add comment in the code
Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
There is a use case that link loss happens accidentally,
and we need to recover that link loss as soon as possible.
Under this circumstance, we will perform link training,
and try to recover the link that's just lost.
However, if link PHY is disabled before link training
happens, then DP display will never come back again.
Also, please note that dropping this disable_phy function
call won't break USB-C hotplug functionality.
(This line of code was firstly introduced associated with
a patch to fix USB-C hotplug issue)
[How]
Don't disable DP transmitter and its encoder before link
training happens, even if link loss is detected.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SDMA edc counter registers were added in gfx edc counters
array. When querying gfx error counter in that array, there
is no way to differentiate sdma instance number for different
asic and then results to NULL pointer access when trying to
read sdma register base address for instances greater
than 2 on Vega20.
In addition, this also results to wrong gfx error counters
since it actually added sdma edc counters.
Therefore, sdma edc counter registers should be separated
from gfx edc counter regsiter array and only get initialized
when driver tries to enable sdma ras.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
move ras_late_init and ras_fini to sdma_ras_funcs table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
invoke sdma query_ras_error_count to get sdma single
bit error count
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
query_ras_error_count function will be invoked to query
single bit error count detected in sdma ip block
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With default PSP FW loading
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ucodes for instances are from different location
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Do not ignore amdgpu_irq_add_id return value while registering
VMC page fault interrupt.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.
V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The SOC15_REG_OFFSET() macro needs to dereference adev->reg_offset[IP]
pointer, which is sometimes NULL when there are fewer than 8 sdma engines.
Avoid that by not initializing the array regardless.
v2: squash in warning fixes
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We use PCI device path in the registered PMU name in order to distinguish
between multiple GPUs. But since tools/perf reserves a special meaning to
dash and colon characters we need to transliterate them to something else.
We choose an underscore.
v2:
* Use strreplace. (Chris)
* Dashes are not good either. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: 05488673a4 ("drm/i915/pmu: Support multiple GPUs")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110113253.12535-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit aebf3b521b)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CVEID: CVE-2019-14615
Summary of Vulnerability
------------------------
Insufficient control flow in certain data structures for some Intel(R)
Processors with Intel Processor Graphics may allow an unauthenticated
user to potentially enable information disclosure via local access
Products affected:
------------------
Intel CPU’s with Gen7, Gen7.5 and Gen9 Graphics.
Public Disclosure Schedule:
---------------------------
Intel is pursuing a coordinated disclosure of this vulnerability.
The targeted public disclosure date is January 14 2020
Mitigation Summary
------------------
This patch provides mitigation for Gen9 hardware only.
Patches for Gen7 and Gen7.5 will be provided later.
Note that Gen8 is not impacted due to a previously implemented
workaround.
The mitigation involves using an existing hardware feature to forcibly
clear down all EU state at each context switch.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJeGHjXAAoJEID/Kx9323OZezwH/iLlbczb6HW7AbloQVa7KRNL
cZ4VHHXmMEQPSprxFuOS21/hVW1rKZzbjTGGI0qbm4qNT3LiK92E0dcoMs1Tp9Xd
eElZpkeO36pqdxc/a256N3xrpmhiMnmk33F36k4qGpt6YUxvFUyZ50re0e3pO03j
wGJ1cMIbAKJQmMC23yQdD44y1TH32fGeUQvwbLgktHAS/r1DxqyaZZq1hSpOiZdV
TqhFLQAXUw2Cxy3FmF7KgcedcZfii1Rq5Gz7iQeyix3CbNM9r+1UGqsjGacDcXS9
/GxhBCSKf35pOj7ZxgtLPCCdL5mSAtvQO/E+yLx3F9axG9bzzNGkLpEsWeCshp8=
=3jTf
-----END PGP SIGNATURE-----
Merge tag 'Intel-CVE-2019-14615' from bundle by Akeem Abodunrin.
Merge Intel Gen9 graphics fix from Akeem Abodunrin:
"Insufficient control flow in certain data structures for some Intel
Processors with Intel Processor Graphics may allow an unauthenticated
user to potentially enable information disclosure via local access
This provides mitigation for Gen9 hardware. Note that Gen8 is not
impacted due to a previously implemented workaround.
The mitigation involves using an existing hardware feature to forcibly
clear down all EU state at each context switch"
* tag 'Intel-CVE-2019-14615' of emailed bundle from Akeem G Abodunrin <akeem.g.abodunrin@intel.com>:
drm/i915/gen9: Clear residual context state on context switch
My compiler yells:
.../drivers/gpu/drm/msm/adreno/adreno_gpu.c:69:27:
error: '/*' within block comment [-Werror,-Wcomment]
Let's fix.
Fixes: 6a0dea02c2 ("drm/msm: support firmware-name for zap fw (v2)")
Link: https://patchwork.freedesktop.org/patch/348519/
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Currently, we reset the timer after a pre-eemption event. This has the
side-effect that the timeslice runs into the second context after the
first is completed after a normal promotion event, causing the second
context to be swapped out early and switched for a third context. To be
more fair, we want to reset the clock after promotion as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113214546.1990139-1-chris@chris-wilson.co.uk
If 'CONFIG_DRM_I915_CAPTURE_ERROR' not configured, there is an error
when compile the kernel:
drivers/gpu/drm/i915/gt/intel_reset.c:
In function intel_gt_handle_error:
drivers/gpu/drm/i915/gt/intel_reset.c:1233:3:
error: too few arguments to function i915_capture_error_state
i915_capture_error_state(gt->i915);
^~~~~~~~~~~~~~~~~~~~~~~~
In file included
from ./drivers/gpu/drm/i915/i915_drv.h:97:0,
from ./drivers/gpu/drm/i915/display/intel_display_types.h:46,
from drivers/gpu/drm/i915/gt/intel_reset.c:10:
./drivers/gpu/drm/i915/i915_gpu_error.h:267:20: note: declared here
static inline void i915_capture_error_state(struct drm_i915_private *dev_priv,
Fixes: 742379c0c4 ("drm/i915: Start chopping up the GPU error capture")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113081942.15982-1-zhangxiaoxu5@huawei.com
If 'CONFIG_DRM_I915_CAPTURE_ERROR' not configured, there are some
errors like:
drivers/gpu/drm/i915/i915_irq.o:
In function `i915_vma_capture_finish':
./drivers/gpu/drm/i915/i915_gpu_error.h:312:
multiple definition of `i915_vma_capture_finish'
drivers/gpu/drm/i915/i915_drv.o:
./drivers/gpu/drm/i915/i915_gpu_error.h:312: first defined here
So, add 'static inline' on the defineation of the 'i915_vma_capture_finish'
Fixes: d713e3ab93fdc("drm/i915: Correct typo in i915_vma_compress_finish stub")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200113104009.13274-1-zhangxiaoxu5@huawei.com
For newer devices we want to require the path to come from the
firmware-name property in the zap-shader dt node.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Since zap firmware can be device specific, allow for a firmware-name
property in the zap node to specify which firmware to load, similarly to
the scheme used for dsp/wifi/etc.
v2: only need a single error msg when we can't load from firmware-name
specified path, and fix comment [Bjorn A.]
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Life is usually easier when we pass around intel_ types instead
of drm_ types. In this case it might not be, but I think being
consistent is a good thing anyway. Also some of this might get
cleaned up a bit more later as we keep propagating the intel_
types further.
@find@
identifier F =~ "^intel_attached_.*";
identifier C;
@@
F(struct drm_connector *C)
{
...
}
@@
identifier find.F;
identifier find.C;
@@
F(
- struct drm_connector *C
+ struct intel_connector *connector
)
{
<...
- C
+ &connector->base
...>
}
@@
identifier find.F;
expression C;
@@
- F(C)
+ F(to_intel_connector(C))
@@
expression C;
@@
- to_intel_connector(&C->base)
+ C
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204180549.1267-3-ville.syrjala@linux.intel.com
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
commit 2589c4025f ("drm/rockchip: Avoid drm_dp_link helpers") changes
the type of variables used to store the display port data rate and
number of lanes to u8. However u8 is not sufficient to store the link
data rate of the display port.
This commit reverts the type of data rate to unsigned int.
Fixes: 2589c4025f ("drm/rockchip: Avoid drm_dp_link helpers")
Signed-off-by: Tobias Schramm <t.schramm@manjaro.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109073129.378507-2-t.schramm@manjaro.org
As we use the active state to keep the vma alive while we are reading
its contents during GPU error capture, we need to mark the
ring->vma as active during execution if we want to include the rinbuffer
in the error state.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110110402.1231745-3-chris@chris-wilson.co.uk
(cherry picked from commit 8ccfc20a7d)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As we use the active state to keep the vma alive while we are reading
its contents during GPU error capture, we need to mark the
context->state vma as active during execution if we want to include it
in the error state.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110110402.1231745-2-chris@chris-wilson.co.uk
(cherry picked from commit 1b8bfc5726)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently we first to try to unbind the VMA (and lazily rebind on next
use) as an optimisation during restore_ggtt_mappings. Ideally, the only
objects in the GGTT upon resume are the pinned kernel objects which
can't be unbound and need to be restored. As the unbind interferes with
the plan to mark those objects as active for error capture, forgo the
optimisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110110402.1231745-1-chris@chris-wilson.co.uk
(cherry picked from commit 80e5351df1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
There seems to be some undocumented bandwidth
bottleneck/dependency which scales with CDCLK,
causing FIFO underruns when CDCLK is too low,
even when it's correct from BSpec point of view.
Currently for TGL platforms we calculate
min_cdclk initially based on pixel_rate divided
by 2, accounting for also plane requirements,
however in some cases the lowest possible CDCLK
doesn't work and causing the underruns.
We've found experimentally that raising cdclk to
at least pixel_rate (rather than pixel_rate/2)
eliminates these underruns, so let's use this as a
temporary workaround until the hardware team
can suggest a more precise remedy.
Explicitly stating here that this seems to be currently
rather a Hack, than final solution.
v2: Use clamp operation instead of min(Matt Roper)
v3: - Fixed commit message(Matt Roper)
- Now using pixel_rate instead of max_cdclk(Jani Nikula)
- Switched to max from clamp(Ville Syrjälä)
Hopefully this hybrid satisfies everyone :)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/issues/402
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200109220547.23817-1-stanislav.lisovskiy@intel.com
Fix build error:
./drivers/gpu/drm/i915/selftests/i915_random.h: In function i915_prandom_u32_max_state:
./drivers/gpu/drm/i915/selftests/i915_random.h:48:23: error:
implicit declaration of function mul_u32_u32; did you mean mul_u64_u32_div? [-Werror=implicit-function-declaration]
return upper_32_bits(mul_u32_u32(prandom_u32_state(state), ep_ro));
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 7ce5b6850b ("drm/i915/selftests: Use mul_u32_u32() for 32b x 32b -> 64b result")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200107135014.36472-1-yuehaibing@huawei.com
(cherry picked from commit 62bf5465b2)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We use PCI device path in the registered PMU name in order to distinguish
between multiple GPUs. But since tools/perf reserves a special meaning to
dash and colon characters we need to transliterate them to something else.
We choose an underscore.
v2:
* Use strreplace. (Chris)
* Dashes are not good either. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: 05488673a4 ("drm/i915/pmu: Support multiple GPUs")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200110113253.12535-1-tvrtko.ursulin@linux.intel.com
When submitting a fenced command we must lock the object reservations
because virtio_gpu_queue_fenced_ctrl_buffer() unlocks after adding the
fence.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Tested-by: Jann Horn <jannh@google.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200110094535.23472-1-kraxel@redhat.com
With the db845c running AOSP, I see the following error on every
frame on the home screen:
[drm:dpu_plane_atomic_check:915] [dpu error]plane33 invalid src 2880x1620+0+470 line:2560
This is due to the error paths in atomic_check using
DPU_ERROR_PLANE(), and the drm_hwcomposer using atomic_check
to decide how to composite the frame (thus it expects to see
atomic_check to fail).
In order to avoid spamming the logs, this patch converts the
DPU_ERROR_PLANE() calls to DPU_DEBUG_PLANE() calls in
atomic_check.
Cc: Todd Kjos <tkjos@google.com>
Cc: Alistair Delva <adelva@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>