* - add LazyAllocator class
- calculate memory consumption at present
* - basic function of lazy_allocator, remaining test
* - modify LazyAllocator
* - modify InfiniTensor to fit LazyAllocator
* - add setDataBlob
- modify alignment
- fix GraphObj::dataMalloc
* - modified alignment value(64bytes -> 8bytes)
- fix LazyAllocator::getPtr()
- some dubug codes and commonts
- do alignment by chaning size instead of tailAddr
* - fix some problem
* - translate chinese comments to english
* - format codes
* - fix test
* - code format
* - modify codes as YdrMaser and bitzyz suggested
* - code format
* - modify codes as constroy suggested
* - codes format
* - modify alignment on cuda
* - code format
* - add test_lazy_allocator
- fix tests where not add input tensor into graph.tensors
- fix tests where init tensor's data before calling graph->dataMallocate()
* - code format
* - remove gpu runtime in test_lazy_allocator
* - fix test_lazy_allocator: remove cuda include
* - add test
* - code format
* - add ifdef for test of allocator
* - code format
* - fix test: remove unused ifdef
* - fix bang test
* - code format
* Merge branch 'master' into dcj/memory_allocator
* fix: fix cuda conv_fp16 run fail
* fix bang_runtime.cc and cuda_runtime.cc
* - update mkl code
* - fix codes for mkl
* - code format
* - remove unused commented codes
- add an empty line at the end of the blob.cc
---------
Co-authored-by: zhangyunze <z13785159769@163.com>
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.
* ADD:concat/split operator and cuda kernels
refector
minor change comment
ADD:concat/split operator and cuda kernels
merge split_kernel and concat_kernel to split_concat_kernel.
Revert "fix"
This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c.
fix
ADD:concat/split operator and cuda kernels
change whole tensor name to composed tensor
fix some
remove unused header.
rebase
add CudaKernel
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
ADD:concat/split operator and cuda kernels
fix some
remove unused header.
rebase
add CudaKernel
ADD:concat/split operator and cuda kernels
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
* remove extra comment; typo fix.
Co-authored-by: Haojie Wang <haojie0429@gmail.com>