* impl sqrt on CUDA
fix parser of Gather and ReduceMean
* fix test_gather
* fix test_cuda_gather
* impl sqrt cpu and add sqrt to test_cuda_unary
* cuda_unary supports arbitary shapes
* fix SplitOp with dim=-1
* fix SplitOp with dim=-1
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.
* ADD:concat/split operator and cuda kernels
refector
minor change comment
ADD:concat/split operator and cuda kernels
merge split_kernel and concat_kernel to split_concat_kernel.
Revert "fix"
This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c.
fix
ADD:concat/split operator and cuda kernels
change whole tensor name to composed tensor
fix some
remove unused header.
rebase
add CudaKernel
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
ADD:concat/split operator and cuda kernels
fix some
remove unused header.
rebase
add CudaKernel
ADD:concat/split operator and cuda kernels
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
* remove extra comment; typo fix.
Co-authored-by: Haojie Wang <haojie0429@gmail.com>