Commit Graph

5 Commits

Author SHA1 Message Date
kilinchange 0dc5347089
memory_allocator (#103)
* - add LazyAllocator class
- calculate memory consumption at present

* - basic function of lazy_allocator, remaining test

* - modify LazyAllocator

* - modify InfiniTensor to fit LazyAllocator

* - add setDataBlob
- modify alignment
- fix GraphObj::dataMalloc

* - modified alignment value(64bytes -> 8bytes)
- fix LazyAllocator::getPtr()
- some dubug codes and commonts
- do alignment by chaning size instead of tailAddr

* - fix some problem

* - translate chinese comments to english

* - format codes

* - fix test

* - code format

* - modify codes as YdrMaser and bitzyz suggested

* - code format

* - modify codes as constroy suggested

* - codes format

* - modify alignment on cuda

* - code format

* - add test_lazy_allocator
- fix tests where not add input tensor into graph.tensors
- fix tests where init tensor's data before calling graph->dataMallocate()

* - code format

* - remove gpu runtime in test_lazy_allocator

* - fix test_lazy_allocator: remove cuda include

* - add test

* - code format

* - add ifdef for test of allocator

* - code format

* - fix test: remove unused ifdef

* - fix bang test

* - code format

* Merge branch 'master' into dcj/memory_allocator

* fix: fix cuda conv_fp16 run fail

* fix bang_runtime.cc and cuda_runtime.cc

* - update mkl code

* - fix codes for mkl

* - code format

* - remove unused commented codes
- add an empty line at the end of the blob.cc

---------

Co-authored-by: zhangyunze <z13785159769@163.com>
2023-08-13 13:39:35 +08:00
zhengly123 a1974aabcd
NNET supports TVM backend and kernels (#78)
* Add: mutator InfoGAN minimum test

* Add: cache and padding (bugs!!)

* Add: expression reader as a cmake target

* Fix: [Intermediate] NMutator::expressionToGraph

To be fix: matmul with implicit broadcast

* Add: matmul broadcast

* Fix: GraphObj ctor should use cloneTensor

* Fix: cuBLAS failure when codegen is enabled

* Add: Exception for checkCuError

* Fix: graph OpList ctor

* Add: expr simplication for TVM

* Add: TVM headers and CMake include paths

* Add: CMake config

* Add: PackedFunc (broken)

* Fix: remove cuCtxCreate which makes TVM fails

* Fix: membound_tvm

* Fix: test_memboundOp

* Add: PRelu Expr and AsTVMVisitor

* Add: Random generator

* Add: support TVM packed function

* Fix: specify runtime

* Add: CMake support of TVM

* Add: detailed output of Matmul

* Add: comments for Matmul

* Chore: format and comments

* Chore: GraphObj::selfCheck without assert control

* Fix: CMAKE_CXX_FLAGS in CMakeLists

* fix merge bug

* update api for mkl batchnorm test

* fix lotus env

* fig header bug

---------

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 00:26:36 +08:00
wendy12022 86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
* move memory format transformation to TensorObj

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl

* fix: fix bugs when rebasing from master

fix: fix bugs when rebasing from master

* fix: update api after rebasing

* fix: fix format; fix onnx import

* fix: fix clang-format

* [fix] fix conv_transpose test

* [fix] use stronger test case for transposed conv

* [fix] remove tensor memory format; fix mkl transpose conv

* [fix] add FIXME tag for op_timer python api

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
wendy12022 a4d6426589
ADD: batch norm operator and cuda kernel. (#44)
fix numInputs of batchNorm, add new line in file ending.

ADD: batch norm operator and cuda kernel.

add training

remove comments.

fix compile error.

add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
zhengly123 2f8f706f1c
Fix CMake USE_CUDA (#36)
* Fix: build lib without cuda

* Chore: rename GBMM and G2BMM files

* Fix: seperate CUDA tests from operator tests

* Fix: CMake CMP0104

* Chore: fix typo

* Chore: remove unused headers

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00