InfiniTensor/include/cuda
deathwings602 11d5aa1ccc
Add TVM codegen for MemboundOp (#35)
* Add:  interface for membound TVM kernel and test

* add getAnsorCode

* add evaluation, but link failed

* add evaluation of kernel, but link failed

* Fix: link libcuda and nvrtc

* add print

* Add: const for source of copy

* compile and evaluate the kernel

* add compute

* fix gen_ansor_op.py

* fix membound_TVM

* format and fix CMakeLists.txt

* fix memory leak

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2022-09-22 18:06:45 +08:00
..
cuda_common.h Add TVM codegen for MemboundOp (#35) 2022-09-22 18:06:45 +08:00
cuda_element_wise.h ADD add/mul/sub/div/pow operators and CPU/CUDA kernels (#26) 2022-09-09 13:43:59 +08:00
cuda_kernel_wihtout_config.h Fix: PerfRecord in shared pointers (#31) 2022-09-18 20:27:18 +08:00
cuda_runtime.h Add TVM codegen for MemboundOp (#35) 2022-09-22 18:06:45 +08:00
cuda_unary.h Add activation operators and kernels 2022-09-16 13:58:57 +08:00
cuda_utility.h Simplify tensor transfer between CPU and CUDA (#10) 2022-08-25 11:29:16 +08:00
gbmm_g2bmm.cuh Fix CMake USE_CUDA (#36) 2022-09-21 12:28:00 +08:00
gbmm_g2bmm.h Fix CMake USE_CUDA (#36) 2022-09-21 12:28:00 +08:00