* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
* fix a little bug which found by new verison CMake
* add code for support BangC language kernel , just like Cuda kernel, not
library
* add bangc kernel
* support BangC kernel
* add code for support BangC kernel
* support bangc kernel
* fix some code from reviewer
* fix code of template fumction
* add code for support bangc kernel
* fix bangc format
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
* Add: interface for membound TVM kernel and test
* add getAnsorCode
* add evaluation, but link failed
* add evaluation of kernel, but link failed
* Fix: link libcuda and nvrtc
* add print
* Add: const for source of copy
* compile and evaluate the kernel
* add compute
* fix gen_ansor_op.py
* fix membound_TVM
* format and fix CMakeLists.txt
* fix memory leak
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
* add code for cambricon mlu, bang, cnnl
* add code for support cambricon mlu,cnnl,cnrt
* add code for support mlu
* add code for support cambricon cnnl
* add code for support mlu
* add code for mlu
* add code for mlu
`
* Update CMakeLists.txt
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
* use protobuf for tensor data save,write,read, in chinese 序列化和反序列化
* add protobuf
* add code for tensor load & save from/to file
* add code for tensor laod & save
* add code for tensor load & save
* add code for tensor save & load
* add code for tensor save & load
* add code for save & load
* add code for load & save
* add code for tensor load & save
* add code for tensor save & load
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Added operator G2BMM and corresponding testcase.
*Added files related to operator G2BMM creating&calling.
*Added custom_ops.cuh&custom_op.h.
* Add operator GBMML
* new version
* Fix: G2BMM and GBMM kernel bugs
* Added testcase of operator GBMML
* clang format
* Added cmake option REQUIRE_GCC9
* Delete redundent file
* Renamed class GBMML into GBMM
* clang format
* Reviewed.
* Added cudahostcompier option.
* Add: explicit CMAKE_CUDA_HOST_COMPILER
* Rename gbmm kernel
* Fix: nvcc warning in GBMM and G2BMM
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.
*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.
*Add: testcase tune.
*Fix: a brief bug in CPU Runtime.
* Fix: add warm-up and repetition in timing
* Add: CUDA runtime and float support
* Refactor: Cuda and Cpu runtimes inherit Runtime
* Add: environment script for Lotus
* Add: Lotus build instructions
* Update README.md
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>