* - add LazyAllocator class
- calculate memory consumption at present
* - basic function of lazy_allocator, remaining test
* - modify LazyAllocator
* - modify InfiniTensor to fit LazyAllocator
* - add setDataBlob
- modify alignment
- fix GraphObj::dataMalloc
* - modified alignment value(64bytes -> 8bytes)
- fix LazyAllocator::getPtr()
- some dubug codes and commonts
- do alignment by chaning size instead of tailAddr
* - fix some problem
* - translate chinese comments to english
* - format codes
* - fix test
* - code format
* - modify codes as YdrMaser and bitzyz suggested
* - code format
* - modify codes as constroy suggested
* - codes format
* - modify alignment on cuda
* - code format
* - add test_lazy_allocator
- fix tests where not add input tensor into graph.tensors
- fix tests where init tensor's data before calling graph->dataMallocate()
* - code format
* - remove gpu runtime in test_lazy_allocator
* - fix test_lazy_allocator: remove cuda include
* - add test
* - code format
* - add ifdef for test of allocator
* - code format
* - fix test: remove unused ifdef
* - fix bang test
* - code format
* Merge branch 'master' into dcj/memory_allocator
* fix: fix cuda conv_fp16 run fail
* fix bang_runtime.cc and cuda_runtime.cc
* - update mkl code
* - fix codes for mkl
* - code format
* - remove unused commented codes
- add an empty line at the end of the blob.cc
---------
Co-authored-by: zhangyunze <z13785159769@163.com>
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.