When kzalloc() fails, we should execute hl_mmu_fini()
to release the MMU module. It's the same when
hl_ctx_init() fails.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We need to make sure our device is idle when rebooting a virtual
machine. This is done in the driver level.
The firmware will later handle FLR but we want to be extra safe and
stop the devices until the FLR is handled.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Up until now validation errors were counted in the parsing field
of the cs_counters struct, so we added a new counter and increased
it when needed.
In addition, there were some locations where only one of the counters
was updated (ctx or aggregate) so add the second one to be updated
as well.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When the firmware security is enabled, the pcie_aux_dbi_reg_addr
register in the PCI controller is blocked. Therefore, ignore
the result of writing to this register and assume it worked. Also
remove the prints on errors in the internal ELBI write function.
If the security is enabled, the firmware is responsible for setting
this register correctly so we won't have any problem.
If the security is disabled, the write will work (unless something
is totally broken at the PCI level and then the whole sequence
will fail).
In addition, remove a write to register pcie_aux_dbi_reg_addr+4,
which was never actually needed.
Moreover, PCIE_DBI registers are blocked to access from host when
firmware security is enabled. Use a different register to flush the
writes.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must fetch FW hard reset capability at every FW boot stage:
preboot, CPU boot, CPU application.
If hard reset is triggered, driver will take into consideration
only the last capability received.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case the clock gating was enabled in preboot we need to disable it
at the H/W initialization stage before touching the MME/TPC registers.
Otherwise, the ASIC can get stuck. If the security is enabled in
the firmware level, the CGM is always disabled and the driver can't
enable it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
hw_queues_mirror was renamed to cs_mirror, so revise accordingly a
comment that refers to this list.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We don't need to set EB on signal packets from collective slave
queues as it degrades performance. Because the slaves are the network
queues, the engine barrier doesn't actually guarantee that the
packet has been sent.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
FW hard reset capability indication is now moved to preboot stage.
Driver will check if HW is dirty only after it validated preboot
is up. If HW is dirty, driver will perform a hard reset according
to the FW capability.
In addition, FW defines a new message which driver need to send in
order to initiate a hard reset.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As we only fetch the CPU_PLL frequency in gaudi, we don't need a
generic get_pll_frequency function which takes a pll index as input
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When the F/W security is enabled, goya needs to fetch the PSOC pll
frequency through a dedicated interface
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We want the fixes in here, and this resolves a merge issue with
drivers/misc/habanalabs/common/memory.c.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a new CB IOCTL opcode that enables a user to query about a CB and
get its usage count.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Modify the CS counter of a CB to be atomic, so no locking is required
when it is being modified or read.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
hl_cs_sanity_checks() extracts the CS type bits of the CS flags, by
masking out the non-type bits.
To save the need for updating the function whenever new bits for
non-type flags are added, add an explicit mask for the CS type bits.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Some messages should be changed to debug mode as we want to keep
minimal prints during normal operation of the device.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
If huge range is not valid, driver uses the host range also for
huge page allocations, but driver never frees its allocation.
This introduces a memory leak every time a user closes its context.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently, if the f/w is in preboot/u-boot they don't perform the new
reset mechanism. Therefore, the driver needs to reset the device.
To prevent reset of PCI_IF, the driver needs to first configure the
reset units.
If the security is enabled, the driver can't configure the reset units.
In that situation, don't reset the card.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
These defines are 64-bit defines so they need ull suffix.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
add support for user to request a timestamp upon
cs completion.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We want to indicate to the user that a certain command submission
is finished long time ago and it is no longer in database.
This means no further information regarding this cs can be obtained.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We have the ECC type field from the firmware but the driver didn't
print it, so we need to add that field to the ECC print message.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Make a function in gaudi.c to be static
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Once firmware security is enabled, driver must fetch pll frequencies
through the firmware message interface instead of reading the registers
directly.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We introduce a new wrapper which allows us to mmu map any size
to any host va_range available. In addition we remove duplicated
code from various places in driver and using this new wrapper
instead.
This wrapper supports mapping only contiguous physical
memory blocks and will be used for mappings that are done to the
driver ASID.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We have several types of command submissions and the user wants to know
which type of command submission has not finished in time when that
event occurs. This is very helpful for debug.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As part of the security effort in which FW will be handling
sensitive HW registers, hard reset flow will be done by FW
and will be triggered by driver.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
F/W message returns 64bit value but up until now we casted it to
a 32bit variable, instead of receiving 64bit in the first place.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
After the MMU-code refactoring, the existing MMU debugfs operations
are no longer working so we need to fix them.
In addition, remove the duplicate code that was in the debugfs code
and use the already existing MMU-code.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Multiple locks are usually a source of problems, which in the MMU
case can be avoided since it is relatively rare that both MMU
tables are updated at the same time.
Therefore, use a single shared lock instead of two separate ones.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add support for reserving va block with alignment different than
page size. This is a pre-requisite for allocations needed in future
ASICs
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add log prints for security and eFuse boot error bits
Signed-off-by: Guy Nisan <gnisan@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
During hard-reset, the driver rejects further IOCTL calls and prints
an error message. That error message should be printed with the correct
device instead of using only the control device.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Once FW security is enabled there is no access to HBM ecc registers,
need to read values from FW using a dedicated interface.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must fetch FW hard reset capability during boot time,
in order to skip the hard reset flow if necessary.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Whether an ASIC has MMU towards its DRAM is an ASIC property, so
move it to the asic fixed properties structure.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Instead of using a dedicated va range for each internal pool,
we introduce a new way for reserving a va block from an existing
va range. This is a more generic way of reserving va blocks for
future use.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We want to handle the scenario in which the driver was not able
to kill all user processes due to many memory mappings.
We need to retry again after some period while releasing the cores.
The devices will be unusable and "in-reset" status during that time.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Future command submission types might be submitted to HW not via the
QMAN queues path. However, it would be still required to have the TDR
mechanism for these CS, and thus the patch renames the TDR fields and
replaces the hw_queues_ prefix with cs_.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Use an array of va_ranges instead of keeping each va_range separately,
we do this for better readability and in order to support access to
a specific range in a much elegant manner.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must verify if HW is dirty before trying to fetch preboot
information. Hence, we move this validation to a prior stage of
the boot sequence.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Due to using dma_mmap_coherent() to perform mmap of dma memory, we
had to clear the vm_pgoff field before calling that function.
However, that broke the userspace (profiler tool) as they relied
on searching the /proc/self/maps for these values to correctly
"disassemble" the topology recipe.
To re-enable that functionality, the driver can simply restore the
value of vm_pgoff before returning to userspace but after calling
dma_mmap_coherent().
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>