Fix some
remove useless code.
add div/pow kernel
Add add/mul/sub operators.
fix cpu kernel.
add element wise kenerl for cuda.
ADD element wise operator.
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Added operator G2BMM and corresponding testcase.
*Added files related to operator G2BMM creating&calling.
*Added custom_ops.cuh&custom_op.h.
* Add operator GBMML
* new version
* Fix: G2BMM and GBMM kernel bugs
* Added testcase of operator GBMML
* clang format
* Added cmake option REQUIRE_GCC9
* Delete redundent file
* Renamed class GBMML into GBMM
* clang format
* Reviewed.
* Added cudahostcompier option.
* Add: explicit CMAKE_CUDA_HOST_COMPILER
* Rename gbmm kernel
* Fix: nvcc warning in GBMM and G2BMM
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Chore: remove print and blank lines
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.
*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.
*Add: testcase tune.
*Fix: a brief bug in CPU Runtime.
* Fix: add warm-up and repetition in timing
* Add: CUDA runtime and float support
* Refactor: Cuda and Cpu runtimes inherit Runtime
* Add: environment script for Lotus
* Add: Lotus build instructions
* Update README.md
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
* Refactor: operator hash and inferShape
* Add: hash without shape
* Add: inferShape interface for given input tensors
* Add: construct outputs in op ctor
* Add: comments for matmul
* Add: opType in AttrVector and WorkloadVector
* Chore: _graph -> graph in Op ctor
* Chore: change the "Node" suffix to "Obj"
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>