Liyan Zheng
e86e993ed4
Add: CUDA graph stream capture (MemboundOp fails)
2023-04-19 16:32:16 +08:00
YdrMaster
26f0d13c26
Dev for 202303ddl ( #66 )
...
* add activation operatiopn relu, tanh, sigmoid on mlu
* commit for format
* add activation backward operation
* add test for activation_backward
* add test
* add convbpfilter
* fix
* add transpsoe code and test
* add trigon function operation on mlu: sin,cos,tan,asin,sinh,asinh
* add copy operation on mlu
* add ceil operation and floor operation
* add operation clip
* add operation cnnl div, test and test for divdemo bangc kernel
* add divnonan operation and test
* add erf operation
* add exp operation
* add operation fill
* add log operation
* add log1p operation
* add l2loss operation
* add maximum and minimum operation
* add mseloss operation
* add negTensor operation
* add power operation
* add reciprocal operation
* add sqrt and rsqrt operation
* add transform operation
* add addn operation
* add muln operation
* cherrry pick some operation
* add floordiv operation and floordivtrunc operation
* add floormod operation
* add cumsum operation
* add det operation
* add pad operation
* format
* add concat operation
* format
* add split operation
* fix concat and split operation
* add round operation
* add pooling operation
* add square operation
* add squaredDifference operation
* code format fix
* add flip operation
* code format fix
* add hardtanh operation
* add logic operation
* add addcdiv and addcmul operation
* add arange operation
* add bitcompute operation
* add net test
* fmt
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* style: rename
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 用 NativeCpuRuntime 替换 CpuRuntime
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix code
* fix code
* fix code by review suggestion
* remove operation which is not the onnx operation
* fix format
* clang format
* refactor: tensor 的 print 加一层模板的 dataToString
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: onnx 导出
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 增加计算图优化接口
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* add clip operation
* feat: 支持导入 clip
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* test: 导入导出测试加入 ci
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix batch norm
* feat: 增加 Shape 算子
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 支持导入 unsqueeze
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 修正 clip 接口
feat: 支持导入 transpose
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* add broadcast operation
* fix elementwise-broadcast
* fix elementwise broadcast
* add broadcast for gpu elementsie
* feat: pad 支持 axes 负数
feat: 不支持的 padding 导出为独立的 pad 算子
feat: 支持导入 onnxsim 过的 inception
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 修正池化的测试
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 导出 pads,支持 inception 导入导出,已加入 ci
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 支持 densenet 导入导出,并加入 ci
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 导入 squeeze
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix softmax
* feat: 导出 clip 和 transpose
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 支持 Conv 的 bias
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: bias of conv
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: bias of conv
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 导入 split
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 导出 split
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: conv
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: conv group
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: matmul 的 bias 没有放在输入里,修正
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix exmaple
* fix: 改正 reduce_mean 导出
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* refactor: 修改 slice 实现与 onnx 一致
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* style: 不导出两个 runtime 函数
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* doc: 中文使用指南
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* doc: 补全指南
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 修复导入数据的问题
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fmt
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 添加 Dropout 基本结构,但不支持两个输出是不同的类型
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 重新导出优化接口
feat: dropout 导入
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* build: BANG 选项加入 Makefile
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fxi code, change of test/kernels/bang/test* is use NativeCpuRuntime.
chaneg of include/bang/bang_runtime is for the cntoolkit upgrade.
* feat: 导出 bang runtime
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* add USE_BANG=1
* fix matmul
* fix reshape
* fix
* fix activation
* fix transpose
* format
* format
* update Makefile
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 支持导入导出 ConvTranspose
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* add prelu on mlu
* fix: ConvTranspose
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* feat: 支持导入导出 PRelu
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* add convtrans on mlu
* fmt
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* docs: 更新 README_CN.md
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix code by review suggestions
* style
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: Softmax 的 axis 可以用默认值?感觉是 onnx 不标准
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix cuda & intelcpu bugs after merging
---------
Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 15:10:33 +08:00
zhengly123
a1974aabcd
NNET supports TVM backend and kernels ( #78 )
...
* Add: mutator InfoGAN minimum test
* Add: cache and padding (bugs!!)
* Add: expression reader as a cmake target
* Fix: [Intermediate] NMutator::expressionToGraph
To be fix: matmul with implicit broadcast
* Add: matmul broadcast
* Fix: GraphObj ctor should use cloneTensor
* Fix: cuBLAS failure when codegen is enabled
* Add: Exception for checkCuError
* Fix: graph OpList ctor
* Add: expr simplication for TVM
* Add: TVM headers and CMake include paths
* Add: CMake config
* Add: PackedFunc (broken)
* Fix: remove cuCtxCreate which makes TVM fails
* Fix: membound_tvm
* Fix: test_memboundOp
* Add: PRelu Expr and AsTVMVisitor
* Add: Random generator
* Add: support TVM packed function
* Fix: specify runtime
* Add: CMake support of TVM
* Add: detailed output of Matmul
* Add: comments for Matmul
* Chore: format and comments
* Chore: GraphObj::selfCheck without assert control
* Fix: CMAKE_CXX_FLAGS in CMakeLists
* fix merge bug
* update api for mkl batchnorm test
* fix lotus env
* fig header bug
---------
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 00:26:36 +08:00
wendy12022
c8b2c8ed32
Cpu backend2 ( #77 )
...
fix review
change Device::MKL to Device::INTELCPU
fix mkl linkage
fix errors according to merge from master
now can call mkl backend
fix softmax/flatten with axis from onnx.
modify README.md
fix memory refree
add env_lotus_intelcpu.sh
fix compile
merge from branch cpu_backend
fix something add gather
fix something
FIX: directory rename from "mkl" to "intelcpu"
ADD: use oneMKL dpcpp interface to implement matmul kernel.
ADD: add dpcpp as compiler for mkl, and fix warnings for clang compiling.
add dpcpp kernel for pow.
ADD: mkl kernel for pad.
ADD: slice mkl kernel.
ADD: reshape/flatten/identity mkl kernel.
ADD: split mkl kernel.
fix compile error
FIX: fix flattenObj with axis.
ADD reduce_mean mkl kernel.
Add concat mkl kernel.
bathNorm for mkl kernel.
sigmoid mkl kernel.
ADD:add mkl kernel for pooling
add more tests for softmax
Now softmax cuda kernel supports any axises.
mkl kernel for softmax
softmax
add axis to softmax operator
add mkl kernel for abs tanh
ADD: relu kernel for mkl
fix binary mkl primitives.
add mkl kernel for binary operators
fix compiler error
move stream to runtime
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
2023-04-17 12:15:23 +08:00
wendy12022
86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. ( #61 )
...
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
YdrMaster
296fcc5aa0
feat: 创建 pyinfinitensor 前端
...
- python 前端项目结构及打包和安装脚本
- 后端编译出 so 改名为 backend,增加 GraphHandler 修改图结构
- ci 支持测试这些功能
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-13 09:19:05 +08:00
Hardy
b0c2a08252
Support bang c kernel wanghailu 0927 ( #43 )
...
* fix a little bug which found by new verison CMake
* add code for support BangC language kernel , just like Cuda kernel, not
library
* add bangc kernel
* support BangC kernel
* add code for support BangC kernel
* support bangc kernel
* fix some code from reviewer
* fix code of template fumction
* add code for support bangc kernel
* fix bangc format
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-30 11:01:52 +08:00
zhengly123
1aefc1b27e
Add python interface for CUDA operator evaluation ( #42 )
...
* Refactor: seperate data generator
* Add: python bindings for opTimer
* Fix: test_perfengine
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-27 10:41:12 +08:00
deathwings602
11d5aa1ccc
Add TVM codegen for MemboundOp ( #35 )
...
* Add: interface for membound TVM kernel and test
* add getAnsorCode
* add evaluation, but link failed
* add evaluation of kernel, but link failed
* Fix: link libcuda and nvrtc
* add print
* Add: const for source of copy
* compile and evaluate the kernel
* add compute
* fix gen_ansor_op.py
* fix membound_TVM
* format and fix CMakeLists.txt
* fix memory leak
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2022-09-22 18:06:45 +08:00
Hardy
c7c974f07a
Add bangc runtime and element-wise kernels
...
* add code for cambricon mlu, bang, cnnl
* add code for support cambricon mlu,cnnl,cnrt
* add code for support mlu
* add code for support cambricon cnnl
* add code for support mlu
* add code for mlu
* add code for mlu
`
* Update CMakeLists.txt
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
2022-09-22 16:57:39 +08:00
zhengly123
2f8f706f1c
Fix CMake USE_CUDA ( #36 )
...
* Fix: build lib without cuda
* Chore: rename GBMM and G2BMM files
* Fix: seperate CUDA tests from operator tests
* Fix: CMake CMP0104
* Chore: fix typo
* Chore: remove unused headers
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00
zhengly123
172d03d6f2
Fix NNet tests after migration ( #27 )
...
* Fix: interpreter
```
4 - readlog (Failed)
8 - test_TConv2gemm (Failed)
11 - test_conv2conv (Failed)
12 - test_conv2gemm (Failed)
15 - test_g2bmm (Failed)
16 - test_guidedDLT (Subprocess aborted)
22 - test_mergeStage (Subprocess aborted)
```
* Exclude readlog from ctest
* Fix: change the path of logs
```
85% tests passed, 4 tests failed out of 27
Total Test time (real) = 100.69 sec
The following tests FAILED:
10 - test_conv2conv (Timeout)
11 - test_conv2gemm (Timeout)
15 - test_guidedDLT (Subprocess aborted)
21 - test_mergeStage (Subprocess aborted)
Errors while running CTest
```
- test_conv2conv 38529 ms total
- test_conv2gemm 37098 ms total
* Fix: test_mergeStage
* Fix: test_guidedDLT
```
Start 1: test_graph
1/27 Test #1 : test_graph ....................... Passed 0.05 sec
Start 2: test_hash
2/27 Test #2 : test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test #3 : test_conv ........................ Passed 4.98 sec
Start 4: test_Interpreter
4/27 Test #4 : test_Interpreter ................. Passed 6.30 sec
Start 5: test_OpSearch
5/27 Test #5 : test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test #6 : test_Rule2VariableMerging ........ Passed 0.03 sec
Start 7: test_TConv2gemm
7/27 Test #7 : test_TConv2gemm .................. Passed 29.45 sec
Start 8: test_as_tvm
8/27 Test #8 : test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test #9 : test_compareFormulas ............. Passed 0.02 sec
Start 10: test_conv2conv
10/27 Test #10 : test_conv2conv ................... Passed 36.55 sec
Start 11: test_conv2gemm
11/27 Test #11 : test_conv2gemm ................... Passed 39.70 sec
Start 12: test_dlt
12/27 Test #12 : test_dlt ......................... Passed 0.03 sec
Start 13: test_exprHash
13/27 Test #13 : test_exprHash .................... Passed 0.02 sec
Start 14: test_g2bmm
14/27 Test #14 : test_g2bmm ....................... Passed 0.16 sec
Start 15: test_guidedDLT
15/27 Test #15 : test_guidedDLT ................... Passed 0.07 sec
Start 16: test_matchConv
16/27 Test #16 : test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test #17 : test_matchElementWise ............ Passed 0.03 sec
Start 18: test_matchMatmul
18/27 Test #18 : test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test #19 : test_matchReshape ................ Passed 0.02 sec
Start 20: test_memboundOp
20/27 Test #20 : test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test #21 : test_mergeStage .................. Passed 0.02 sec
Start 22: test_oobChecker
22/27 Test #22 : test_oobChecker .................. Passed 0.02 sec
Start 23: test_rangeMagnify
23/27 Test #23 : test_rangeMagnify ................ Passed 0.02 sec
Start 24: test_relaxation
24/27 Test #24 : test_relaxation .................. Passed 0.02 sec
Start 25: test_serializer
25/27 Test #25 : test_serializer .................. Passed 0.03 sec
Start 26: test_simplify
26/27 Test #26 : test_simplify .................... Passed 0.02 sec
Start 27: test_subset
27/27 Test #27 : test_subset ...................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 117.72 sec
```
* Fix: format
* Replace nnet:Ref with infini::Ref
```
Start 1: test_graph
1/27 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/27 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test 3: test_conv ........................ Passed 4.45 sec
Start 4: test_Interpreter
4/27 Test 4: test_Interpreter ................. Passed 4.37 sec
Start 5: test_OpSearch
5/27 Test 5: test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test 6: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 7: test_TConv2gemm
7/27 Test 7: test_TConv2gemm .................. Passed 23.40 sec
Start 8: test_as_tvm
8/27 Test 8: test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test 9: test_compareFormulas ............. Passed 0.01 sec
Start 10: test_conv2conv
10/27 Test 10: test_conv2conv ................... Passed 32.28 sec
Start 11: test_conv2gemm
11/27 Test 11: test_conv2gemm ................... Passed 29.41 sec
Start 12: test_dlt
12/27 Test 12: test_dlt ......................... Passed 0.02 sec
Start 13: test_exprHash
13/27 Test 13: test_exprHash .................... Passed 0.01 sec
Start 14: test_g2bmm
14/27 Test 14: test_g2bmm ....................... Passed 0.14 sec
Start 15: test_guidedDLT
15/27 Test 15: test_guidedDLT ................... Passed 0.06 sec
Start 16: test_matchConv
16/27 Test 16: test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test 17: test_matchElementWise ............ Passed 0.02 sec
Start 18: test_matchMatmul
18/27 Test 18: test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test 19: test_matchReshape ................ Passed 0.01 sec
Start 20: test_memboundOp
20/27 Test 20: test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test 21: test_mergeStage .................. Passed 0.01 sec
Start 22: test_oobChecker
22/27 Test 22: test_oobChecker .................. Passed 0.01 sec
Start 23: test_rangeMagnify
23/27 Test 23: test_rangeMagnify ................ Passed 0.01 sec
Start 24: test_relaxation
24/27 Test 24: test_relaxation .................. Passed 0.01 sec
Start 25: test_serializer
25/27 Test 25: test_serializer .................. Passed 0.02 sec
Start 26: test_simplify
26/27 Test 26: test_simplify .................... Passed 0.01 sec
Start 27: test_subset
27/27 Test 27: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 94.47 sec
```
* Relax time limit for CPU conv
```
Start 1: test_graph
1/29 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/29 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/29 Test 3: test_conv ........................ Passed 4.47 sec
Start 4: test_matmul
4/29 Test 4: test_matmul ...................... Passed 2.61 sec
Start 5: test_pooling
5/29 Test 5: test_pooling ..................... Passed 2.57 sec
Start 6: test_Interpreter
6/29 Test 6: test_Interpreter ................. Passed 4.35 sec
Start 7: test_OpSearch
7/29 Test 7: test_OpSearch .................... Passed 0.02 sec
Start 8: test_Rule2VariableMerging
8/29 Test 8: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 9: test_TConv2gemm
9/29 Test 9: test_TConv2gemm .................. Passed 23.32 sec
Start 10: test_as_tvm
10/29 Test 10: test_as_tvm ...................... Passed 0.02 sec
Start 11: test_compareFormulas
11/29 Test 11: test_compareFormulas ............. Passed 0.02 sec
Start 12: test_conv2conv
12/29 Test 12: test_conv2conv ................... Passed 32.12 sec
Start 13: test_conv2gemm
13/29 Test 13: test_conv2gemm ................... Passed 30.59 sec
Start 14: test_dlt
14/29 Test 14: test_dlt ......................... Passed 0.02 sec
Start 15: test_exprHash
15/29 Test 15: test_exprHash .................... Passed 0.01 sec
Start 16: test_g2bmm
16/29 Test 16: test_g2bmm ....................... Passed 0.14 sec
Start 17: test_guidedDLT
17/29 Test 17: test_guidedDLT ................... Passed 0.07 sec
Start 18: test_matchConv
18/29 Test 18: test_matchConv ................... Passed 0.02 sec
Start 19: test_matchElementWise
19/29 Test 19: test_matchElementWise ............ Passed 0.02 sec
Start 20: test_matchMatmul
20/29 Test 20: test_matchMatmul ................. Passed 0.02 sec
Start 21: test_matchReshape
21/29 Test 21: test_matchReshape ................ Passed 0.02 sec
Start 22: test_memboundOp
22/29 Test 22: test_memboundOp .................. Passed 0.02 sec
Start 23: test_mergeStage
23/29 Test 23: test_mergeStage .................. Passed 0.01 sec
Start 24: test_oobChecker
24/29 Test 24: test_oobChecker .................. Passed 0.02 sec
Start 25: test_rangeMagnify
25/29 Test 25: test_rangeMagnify ................ Passed 0.02 sec
Start 26: test_relaxation
26/29 Test 26: test_relaxation .................. Passed 0.02 sec
Start 27: test_serializer
27/29 Test 27: test_serializer .................. Passed 0.03 sec
Start 28: test_simplify
28/29 Test 28: test_simplify .................... Passed 0.02 sec
Start 29: test_subset
29/29 Test 29: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 29
Total Test time (real) = 100.65 sec
```
* Remove out-of-date tests
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-13 15:17:22 +08:00
Hardy
03de74f4bc
Tensor serialization ( #25 )
...
* use protobuf for tensor data save,write,read, in chinese 序列化和反序列化
* add protobuf
* add code for tensor load & save from/to file
* add code for tensor laod & save
* add code for tensor load & save
* add code for tensor save & load
* add code for tensor save & load
* add code for save & load
* add code for load & save
* add code for tensor load & save
* add code for tensor save & load
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2022-09-13 11:27:41 +08:00
Anmuliar
0409eafb5f
Operators g2bmm&gbmm transplantation ( #24 )
...
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Added operator G2BMM and corresponding testcase.
*Added files related to operator G2BMM creating&calling.
*Added custom_ops.cuh&custom_op.h.
* Add operator GBMML
* new version
* Fix: G2BMM and GBMM kernel bugs
* Added testcase of operator GBMML
* clang format
* Added cmake option REQUIRE_GCC9
* Delete redundent file
* Renamed class GBMML into GBMM
* clang format
* Reviewed.
* Added cudahostcompier option.
* Add: explicit CMAKE_CUDA_HOST_COMPILER
* Rename gbmm kernel
* Fix: nvcc warning in GBMM and G2BMM
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-08 21:31:35 +08:00
Hardy
32a01efbbe
add code for backtrace ( #21 )
...
* add code for backtrace
* Add: infini::Exception
```
Test project /home/zly/InfiniTensor_aux/build
Start 1: test_graph
1/4 Test #1 : test_graph ....................... Passed 0.05 sec
Start 2: test_hash
2/4 Test #2 : test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/4 Test #3 : test_conv ........................ Passed 4.40 sec
Start 4: test_pooling
4/4 Test #4 : test_pooling ..................... Passed 2.47 sec
100% tests passed, 0 tests failed out of 4
Total Test time (real) = 6.94 sec
```
* Fix: USE_BACKTRACE in cmake
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-01 20:30:12 +08:00
wendy12022
48293576c0
Add maxpool and avgpool operators ( #17 )
...
* ADD:maxpool&&avgpool operators.
add OperatorObj::getDType()
clang format
FIX:timeit API has changed.
* Fix: Tensor::getInputs is const method
* Chore
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-31 14:44:53 +08:00
Anmuliar
e076991f2f
Revert "Operator serialization ( #14 )" ( #15 )
...
This reverts commit 25f0c441d2
.
2022-08-29 16:02:48 +08:00
Anmuliar
25f0c441d2
Operator serialization ( #14 )
...
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.
*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.
*Add: testcase tune.
*Fix: a brief bug in CPU Runtime.
2022-08-29 15:59:03 +08:00
zhengly123
04ea5eed38
Add CUDA runtime ( #6 )
...
* Fix: add warm-up and repetition in timing
* Add: CUDA runtime and float support
* Refactor: Cuda and Cpu runtimes inherit Runtime
* Add: environment script for Lotus
* Add: Lotus build instructions
* Update README.md
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-22 15:01:03 +08:00
zhengly123
9303ddda8e
Add Conv operator and naive CPU implemenation ( #5 )
...
* Add: Conv definition
* Add: tensor copy data from vector
* Add: CPU conv kernel
* Fix: replace Int32 with UInt32 in DataType
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-17 14:16:01 +08:00
Liyan Zheng
8b685ae4a6
Update: OpAttrs -> OpPerfKey
2022-08-09 14:58:45 +08:00
Liyan Zheng
b7e2096a26
Add: nnet code
2022-08-08 16:02:07 +08:00
Liyan Zheng
6c356d5b42
Add: kernel registry and naive Matmul kernel
2022-08-06 15:58:40 +08:00
Liyan Zheng
e6101b0336
Add: graph, tensor, and operator
2022-07-31 21:44:03 +08:00