* feat: support mixed dtype
* feat: support cast op
* test: add test for cast op
* feat: support datatype BFloat16
* feat: support data convert fp32 <-> bfp16
* fix: fix all op's infershape func
* fix as review comment
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.