The offset should be 8 on Kepler and later.
Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Very rough, no idea how correct it is at this point, but it prevents
getteximage-depth from piglit from hanging the GPU.
v2: updated with NV_PCE_FE_LAUNCHERR_REPORT values provided by NVIDIA
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The CPU-side tracking of engine runlists was not protected by a lock,
leading to list corruption, eventually causing runlist_update() to
overrun the GPU-side runlist, triggering an OOPS.
Fixes some of the issues noticed during parallel piglit runs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We want to unlock nv_devices_mutex in this error path as well.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Without this patch a pixel clock rate above 165 MHz on a TMDS link is
assumed to be dual link. This is true for DVI, but not for HDMI. HDMI
supports no dual link, but it supports pixel clock rates above 165 MHz.
Only activate Dual Link mode when it is actually possible and requested.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
[imirkin: check for hdmi monitor for computing proto, use sor ctrl to
enable extra config bit]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA have indicated that the workaround is required on all GK10[467]
boards that have the PGOB fuse set.
I've left the commandline option in place for now, as paranoia.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Each GPCCS unit was reading the mask from GPC0, which causes problems on
boards where some GPCs are missing PPCs.
Part of the fix for fdo#92761.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There's a few places where we need to access a GPC register from ucode,
but outside of the falcon's io address space. To do this we need to
calculate the offset based on which GPC we're executing on.
This used to be done manually, but we've since found a "base" offset
that can be added by the hardware. To use this, an extra bit needs to
be set in the register address, which is what this macro achieves.
There should be no functional change from this commit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These nvkm_object_func structures are never modified. All other
nvkm_object_func structures are declared as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GF110+ supports both the A and B compute classes, make sure to accept
both.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA provided the documentation for mp error 0x10, INVALID_ADDR_SPACE,
which apparently happens when trying to use an atomic operation on
local or shared memory (instead of global memory).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Was not able to obtain a trace of NVRM due to kernel version annoyances,
however, experimentally confirmed that the WAR we use on NV50/G8x boards
works here too.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Increase clock timeout of some unknown engines in order to avoid failure
at high gpcclk rate.
This fixes IBUS read faults on my GF119 when reclocking is manually
enabled. Note that memory reclocking is completely broken and NvMemExec
has to be disabled to allow core clock reclocking only.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I got confirmation that we can read and change the voltage with the same code.
The divider is also computed correctly on the gm204 we got our hands on.
Thanks to Yoshimo on IRC for executing the tests on his gm204!
Signed-off-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Let's ignore the other desktop Maxwells until I get my hands on one and confirm
that we still can change the voltage.
Signed-off-by: Martin Peres <martin.peres@free.fr>
Most Keplers actually use the GPIO-based voltage management instead of the new
PWM-based one. Use the GPIO mode as a fallback as it already gracefully handles
the case where no GPIOs exist.
All the Maxwells seem to use the PWM method though.
v2:
- Do not forget to commit the PWM configuration change!
Signed-off-by: Martin Peres <martin.peres@free.fr>
Current Tegra code taking advantage of the IOMMU assumes a hardcoded
value for the IOMMU bit. Make it a platform property instead for
flexibility.
v2 (Ben Skeggs): remove nvkm dependence on drm structures
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pull drm fixes from Dave Airlie:
"Just a bunch of fixes to squeeze in before -rc1:
- three nouveau regression fixes
- one qxl regression fix
- a bunch of i915 fixes
... and some core displayport/atomic fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/device: enable c800 quirk for tecra w50
drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
drm/nouveau/gr/nv04: fix big endian setting on gr context
drm/qxl: validate monitors config modes
drm/i915: Allow DSI dual link to be configured on any pipe
drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS
drm/i915: Fix CSR MMIO address check
drm/i915: Limit the number of loops for reading a split 64bit register
drm/i915: Fix broken mst get_hw_state.
drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x
uapi/drm/i915_drm.h: fix userspace compilation.
drm/i915: Always mark the object as dirty when used by the GPU
drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed
drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed
drm/dp: Define AUX_RETRY_INTERVAL as 500 us
drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
Broken since "gr: convert user classes to new-style nvkm_object"
Tested on a PPC64 G5 + NV34
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm for 4.3. Nouveau is
probably the biggest amount of changes in here, since it missed 4.2.
Highlights below, along with the usual bunch of fixes.
All stuff outside drm should have applicable acks.
Highlights:
- new drivers:
freescale dcu kms driver
- core:
more atomic fixes
disable some dri1 interfaces on kms drivers
drop fb panic handling, this was just getting more broken, as more locking was required.
new core fbdev Kconfig support - instead of each driver enable/disabling it
struct_mutex cleanups
- panel:
more new panels
cleanup Kconfig
- i915:
Skylake support enabled by default
legacy modesetting using atomic infrastructure
Skylake fixes
GEN9 workarounds
- amdgpu:
Fiji support
CGS support for amdgpu
Initial GPU scheduler - off by default
Lots of bug fixes and optimisations.
- radeon:
DP fixes
misc fixes
- amdkfd:
Add Carrizo support for amdkfd using amdgpu.
- nouveau:
long pending cleanup to complete driver,
fully bisectable which makes it larger,
perfmon work
more reclocking improvements
maxwell displayport fixes
- vmwgfx:
new DX device support, supports OpenGL 3.3
screen targets support
- mgag200:
G200eW support
G200e new revision support
- msm:
dragonboard 410c support, msm8x94 support, msm8x74v1 support
yuv format support
dma plane support
mdp5 rotation
initial hdcp
- sti:
atomic support
- exynos:
lots of cleanups
atomic modesetting/pageflipping support
render node support
- tegra:
tegra210 support (dc, dsi, dp/hdmi)
dpms with atomic modesetting support
- atmel:
support for 3 more atmel SoCs
new input formats, PRIME support.
- dwhdmi:
preparing to add audio support
- rockchip:
yuv plane support"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1369 commits)
drm/amdgpu: rename gmc_v8_0_init_compute_vmid
drm/amdgpu: fix vce3 instance handling
drm/amdgpu: remove ib test for the second VCE Ring
drm/amdgpu: properly enable VM fault interrupts
drm/amdgpu: fix warning in scheduler
drm/amdgpu: fix buffer placement under memory pressure
drm/amdgpu/cz: fix cz_dpm_update_low_memory_pstate logic
drm/amdgpu: fix typo in dce11 watermark setup
drm/amdgpu: fix typo in dce10 watermark setup
drm/amdgpu: use top down allocation for non-CPU accessible vram
drm/amdgpu: be explicit about cpu vram access for driver BOs (v2)
drm/amdgpu: set MEC doorbell range for Fiji
drm/amdgpu: implement burst NOP for SDMA
drm/amdgpu: add insert_nop ring func and default implementation
drm/amdgpu: add amdgpu_get_sdma_instance helper function
drm/amdgpu: add AMDGPU_MAX_SDMA_INSTANCES
drm/amdgpu: add burst_nop flag for sdma
drm/amdgpu: add count field for the SDMA NOP packet v2
drm/amdgpu: use PT for VM sync on unmap
drm/amdgpu: make wait_event uninterruptible in push_job
...
The copyright header in nvkm/engine/device/platform.c has been replaced
with the NVIDIA one from drm/nouveau_platform.c, as most of the actual
code is now theirs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This ensures we have a valid mask of disabled engines before we start
trying to execute fini()/init() on the subdevs, potentially touching
devices that don't exist.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Replaces the piece-by-piece (in response to NV_DEVICE ctor args) device
contruction with a once-off all-or-nothing approach, eliminating some
tricky refcounting issues. The partial device init capability was only
required by some tools, and has been moved to probe time instead.
Temporarily removes a workaround for some boards where we need to fiddle
with AGP registers before executing the DEVINIT scripts. A later commit
in this series reinstates it.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pretty much every subdev/engine is going to need access to nvkm_device
shortly to touch registers and/or output messages.
The odd placement of the includes is necessary to work around some
inter-dependencies that currently exist. This will be fixed later.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A future commit will hide the platform/pci specifics from nvkm_device,
but it's still very useful in a lot of places to have access to the
Linux device struct.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will be used in upcoming commits to remove the need for lookup/runtime
type-checking functions when accessing foreign subdevs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
MSI interrupts appear to not work for nv46 based cards. Change the mc
subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is
identical to the nv44 mc code except that it does not use msi
(it does not define a msi_rearm callback).
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=90435
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
High level hardware events related to PBFB will monitor all partitions.
While we are at it, fix bitfield for this mux.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This mux only exists on GF108+ (except for GF110 one), but since it is
not used by the userspace we can drop it for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Hardware signals index 0x00 are defined for some domains and they have
to be allowed to enable sources like the others.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I thought that using TPC[0x0] like for G84:GT215 was sufficient on G80,
but it's actually not the case. According to NVIDIA PerfKit on Windows,
we have to configure PGRAPH related muxs on TPC[0x3] for this chipset.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Recognize GM20B and assign the right engines and subdevs.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Add support for GM20B's graphics engine, based on GK20A. Note that this
code alone will not allow the engine to initialize on released devices
which require PMU-assisted secure boot.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GK20A's initialization was based on GK104, but differences exist in the
way the initial context is built and the initialization process itself.
This patch follows the same initialization sequence as nvgpu performs
to avoid bad surprises. Since the register bundles initialization also
differ considerably from GK104, the register packs are now loaded from
firmware files, again similarly to what is done with nvgpu.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA will officially start providing GR firmwares through
linux-firmware for GPUs that require it. Change the GR firmware lookup
function to use these files.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These signals and sources have been reverse engineered from CUPTI
(Linux). Graphics signals exposed by PerfKit (Windows only) will be
added later. I need to reverse engineer them and it's a bit painful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This trivial patch makes thing more consistent since hardware signals
names are prefixed by 'pcXX'.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is going to be very useful for GF100+ because each GPC can
have its own domain of counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These signals and sources have been reverse engineered from CUPTI
(Linux). Graphics signals exposed by PerfKit (Windows only) will be
added later. I need to reverse engineer them and it's a bit painful.
This commit also adds a new class for GF108 and GF117.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These signals and sources have been reverse engineered from NVIDIA
PerfKit (Windows) and CUPTI (Linux), they will be used to build complex
hardware events from the userspace.
This commit also adds a new class for GT200.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Configuring counters from the userspace require the kernel to handle some
logic related to performance counters. Basically, it has to find a free
slot to assign a counter, to handle extra counting modes like B4/B6 and it
must return and error when it can't configure a counter.
In my opinion, the kernel should not handle all of that logic but it
should only write the configuration sent by the userspace without
checking anything. In other words, it should overwrite the configuration
even if it's already counting and do not return any errors.
This patch allows the userspace to configure a domain instead of
separate counters. This has the advantage to move all of the logic to
the userspace.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This adds a new method NVIF_PERFCTR_V0_INIT which starts a batch of
hardware counters for sampling. This will allow the userspace to start
a monitoring session using the INIT method and to stop it with SAMPLE,
for example before and after a frame is rendered.
This commit temporarily breaks nv_perfmon but this is going to be fixed
with the upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This allows to query the ID, the mask and the user-readable name of
sources for each signal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A source (or multiplexer) is a tuple addr+mask+shift which allows to
control a block of signals. The maximum number of sources that a signal
can define is arbitrary limited to 8 and this should be large enough.
This patch allows to define multi-level of sources for a signal.
Each different sources are stored to a global list and will be exposed
to the userspace through the nvif interface in order to avoid conflicts.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This signal index must be always allowed even if it's not clearly
defined in a domain in order to monitor a counter like 0x03020100
because it's the default value of signals.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
16 bits is large enough to store the maximum number of signals available
for one domain (i.e. 256).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will allow to configure performance counters with hardware signal
indexes instead of user-readable names in an upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>