YdrMaster
4ffaa44c1e
fix: Matmul 支持 2 维或以上的输入
...
> 现在能导入 resnet18
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-15 17:23:32 +08:00
YdrMaster
a27391fcdc
fix: 修正 batchNorm 实现
...
- onnx 和 pytorch 认为 batchNorm 的 4 个参数是 [c] 形状的,cuDNN 可能认为是 [1,c,1,...]。
优化已改为 [c],但 cuDNN 推理没有改;
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-15 17:23:32 +08:00
Haojie Wang
0f52d04882
Merge branch 'master' into dev-onnx
2023-03-15 14:52:03 +08:00
YdrMaster
978269162a
fix: 移除 c++ 中的中文注释,python TODO 改 FIXME
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-15 14:48:39 +08:00
deathwings602
40d1b1c91b
Add ConvTransposedNHWC ( #67 )
...
* Add: IT_ASSERT_TODO
* [WIP] Add: ConvTranspose2d mutation test
* add ConvTransposedNHWC
* fix test_cuda_transposed_2d
---------
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2023-03-01 14:15:02 +08:00
YdrMaster
315763a83a
feat: 前端支持 pad 及单元测试
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-15 11:41:06 +08:00
YdrMaster
7893ae0cca
opt: 优化 PadObj 和 SplitObj 构造器实现
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-15 11:28:49 +08:00
YdrMaster
f9d0076a86
opt: 优化 SliceObj 构造器实现
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 16:44:08 +08:00
YdrMaster
341cf1f943
feat: 前端支持 pool 及单元测试
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 16:26:47 +08:00
YdrMaster
62ceb78ae3
feat: 前端支持 reduceMean 及单元测试
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 15:35:01 +08:00
YdrMaster
fb9d84dbb7
opt: 优化 ReduceMeanObj 构造器实现
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 15:14:28 +08:00
YdrMaster
d11fb0ad5f
feat: 前端支持 gather 及单元测试
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 14:16:01 +08:00
YdrMaster
7626efbfa8
feat: 前端支持 reshape
...
- 无法测试,因为后端不支持 shape 的 INT64 类型
opt: ReshapeObj 构造改为全部传值并在内部 move
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 09:51:11 +08:00
wendy12022
d780f687fc
ADD: reconfig ResizeObj, support "tf_crop_and_resize " and cubic coeff kernel. ( #59 )
...
add cubic coef
add tf_crop_and_resize
2022-12-24 04:02:21 +08:00
wendy12022
c5966f8d81
Add: resize operator and cuda kernel,support nearest/linear coef. ( #51 )
...
ADD: resize operator and cuda kernel,support nearest/linear coef.
fix some
fix tests
add more tests for linear mode.
add linear coef mode.
add scales
add tests
fix tests.
add notLarger notSmaller
fix
add test
ADD:resize operator and cuda kernel
2022-11-14 09:30:22 +08:00
zhengly123
63d8aff985
Fix: cuCtxCreate before other initialization ( #49 )
...
Fix: create cuCtx at the very beginning
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-10-19 15:41:48 +08:00
wendy12022
d1c913010f
ADD:reduce_mean operator and cuda kernel. ( #47 )
...
add new line at file ending.
2022-10-15 16:53:58 +08:00
wendy12022
a4d6426589
ADD: batch norm operator and cuda kernel. ( #44 )
...
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
wendy12022
26cee55e81
ADD:extend operator and cuda kernel. ( #40 )
...
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 14:52:50 +08:00
wendy12022
fe14c91f54
ADD: Gather operator and cuda kernel. ( #41 )
...
fix a memory leak.
add tests.
ADD gather cuda kernel.
ADD gather operator
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 14:44:20 +08:00
wendy12022
3c6e208f42
ADD:concat/split operator and cuda kernels ( #29 )
...
* ADD:concat/split operator and cuda kernels
refector
minor change comment
ADD:concat/split operator and cuda kernels
merge split_kernel and concat_kernel to split_concat_kernel.
Revert "fix"
This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c.
fix
ADD:concat/split operator and cuda kernels
change whole tensor name to composed tensor
fix some
remove unused header.
rebase
add CudaKernel
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
ADD:concat/split operator and cuda kernels
fix some
remove unused header.
rebase
add CudaKernel
ADD:concat/split operator and cuda kernels
add test for split.
ADD split operator and cuda kernel.
modify test.
ADD:concat operator and cuda kernel.
* remove extra comment; typo fix.
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 11:01:30 +08:00
wendy12022
5560d0f2fb
ADD:pad/slice operator and cuda kernel. ( #39 )
...
fix compile error
refector
clang format
split test.
fix compile error.
ADD slice cuda kernel.
ADD slice operator.
ADD:pad operator and cuda kernel.
2022-09-29 10:29:24 +08:00
wendy12022
9032cbb973
Add: reshape/flatten/identity OP and cuda kernel ( #34 )
...
* ADD:reshape/flatten/identity operators and cuda kernel.
fix: use cudaMemcpyAsync
clang format.
ADD flatten/identity operator.
add test for reshape.
ADD: reshape operator and cuda kernel.
* Fix: seperate CUDA tests & remove old header
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 14:04:30 +08:00
zhengly123
2f8f706f1c
Fix CMake USE_CUDA ( #36 )
...
* Fix: build lib without cuda
* Chore: rename GBMM and G2BMM files
* Fix: seperate CUDA tests from operator tests
* Fix: CMake CMP0104
* Chore: fix typo
* Chore: remove unused headers
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00
zhengly123
8f67a5cc76
Add: ConvTransposed ( #33 )
...
* Add: convTransposed2d operator
* Fix: IT_ASSERT namespace
* Add: nullptr check in as for Ref
* Fix: conv transpose operator and kernel
* Fix: makes PerfEngine singleton
* Add: ConvTransposed test
* Fix: rebase to master (PerfRecord shared_ptr)
* Revert: Ref with nullptr check
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-19 15:05:39 +08:00
Hardy
6ac106cba4
Add activation operators and kernels
...
* add code for activation operation
* add code for activation operation on GPU
* add test code for activation operation
* add code for activation operation
* add code for activation on gpu ,use cudnn
* add code for activation on GPU use cudnn
* Chore: add constants.h and remove comments
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-16 13:58:57 +08:00
zhengly123
172d03d6f2
Fix NNet tests after migration ( #27 )
...
* Fix: interpreter
```
4 - readlog (Failed)
8 - test_TConv2gemm (Failed)
11 - test_conv2conv (Failed)
12 - test_conv2gemm (Failed)
15 - test_g2bmm (Failed)
16 - test_guidedDLT (Subprocess aborted)
22 - test_mergeStage (Subprocess aborted)
```
* Exclude readlog from ctest
* Fix: change the path of logs
```
85% tests passed, 4 tests failed out of 27
Total Test time (real) = 100.69 sec
The following tests FAILED:
10 - test_conv2conv (Timeout)
11 - test_conv2gemm (Timeout)
15 - test_guidedDLT (Subprocess aborted)
21 - test_mergeStage (Subprocess aborted)
Errors while running CTest
```
- test_conv2conv 38529 ms total
- test_conv2gemm 37098 ms total
* Fix: test_mergeStage
* Fix: test_guidedDLT
```
Start 1: test_graph
1/27 Test #1 : test_graph ....................... Passed 0.05 sec
Start 2: test_hash
2/27 Test #2 : test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test #3 : test_conv ........................ Passed 4.98 sec
Start 4: test_Interpreter
4/27 Test #4 : test_Interpreter ................. Passed 6.30 sec
Start 5: test_OpSearch
5/27 Test #5 : test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test #6 : test_Rule2VariableMerging ........ Passed 0.03 sec
Start 7: test_TConv2gemm
7/27 Test #7 : test_TConv2gemm .................. Passed 29.45 sec
Start 8: test_as_tvm
8/27 Test #8 : test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test #9 : test_compareFormulas ............. Passed 0.02 sec
Start 10: test_conv2conv
10/27 Test #10 : test_conv2conv ................... Passed 36.55 sec
Start 11: test_conv2gemm
11/27 Test #11 : test_conv2gemm ................... Passed 39.70 sec
Start 12: test_dlt
12/27 Test #12 : test_dlt ......................... Passed 0.03 sec
Start 13: test_exprHash
13/27 Test #13 : test_exprHash .................... Passed 0.02 sec
Start 14: test_g2bmm
14/27 Test #14 : test_g2bmm ....................... Passed 0.16 sec
Start 15: test_guidedDLT
15/27 Test #15 : test_guidedDLT ................... Passed 0.07 sec
Start 16: test_matchConv
16/27 Test #16 : test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test #17 : test_matchElementWise ............ Passed 0.03 sec
Start 18: test_matchMatmul
18/27 Test #18 : test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test #19 : test_matchReshape ................ Passed 0.02 sec
Start 20: test_memboundOp
20/27 Test #20 : test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test #21 : test_mergeStage .................. Passed 0.02 sec
Start 22: test_oobChecker
22/27 Test #22 : test_oobChecker .................. Passed 0.02 sec
Start 23: test_rangeMagnify
23/27 Test #23 : test_rangeMagnify ................ Passed 0.02 sec
Start 24: test_relaxation
24/27 Test #24 : test_relaxation .................. Passed 0.02 sec
Start 25: test_serializer
25/27 Test #25 : test_serializer .................. Passed 0.03 sec
Start 26: test_simplify
26/27 Test #26 : test_simplify .................... Passed 0.02 sec
Start 27: test_subset
27/27 Test #27 : test_subset ...................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 117.72 sec
```
* Fix: format
* Replace nnet:Ref with infini::Ref
```
Start 1: test_graph
1/27 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/27 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test 3: test_conv ........................ Passed 4.45 sec
Start 4: test_Interpreter
4/27 Test 4: test_Interpreter ................. Passed 4.37 sec
Start 5: test_OpSearch
5/27 Test 5: test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test 6: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 7: test_TConv2gemm
7/27 Test 7: test_TConv2gemm .................. Passed 23.40 sec
Start 8: test_as_tvm
8/27 Test 8: test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test 9: test_compareFormulas ............. Passed 0.01 sec
Start 10: test_conv2conv
10/27 Test 10: test_conv2conv ................... Passed 32.28 sec
Start 11: test_conv2gemm
11/27 Test 11: test_conv2gemm ................... Passed 29.41 sec
Start 12: test_dlt
12/27 Test 12: test_dlt ......................... Passed 0.02 sec
Start 13: test_exprHash
13/27 Test 13: test_exprHash .................... Passed 0.01 sec
Start 14: test_g2bmm
14/27 Test 14: test_g2bmm ....................... Passed 0.14 sec
Start 15: test_guidedDLT
15/27 Test 15: test_guidedDLT ................... Passed 0.06 sec
Start 16: test_matchConv
16/27 Test 16: test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test 17: test_matchElementWise ............ Passed 0.02 sec
Start 18: test_matchMatmul
18/27 Test 18: test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test 19: test_matchReshape ................ Passed 0.01 sec
Start 20: test_memboundOp
20/27 Test 20: test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test 21: test_mergeStage .................. Passed 0.01 sec
Start 22: test_oobChecker
22/27 Test 22: test_oobChecker .................. Passed 0.01 sec
Start 23: test_rangeMagnify
23/27 Test 23: test_rangeMagnify ................ Passed 0.01 sec
Start 24: test_relaxation
24/27 Test 24: test_relaxation .................. Passed 0.01 sec
Start 25: test_serializer
25/27 Test 25: test_serializer .................. Passed 0.02 sec
Start 26: test_simplify
26/27 Test 26: test_simplify .................... Passed 0.01 sec
Start 27: test_subset
27/27 Test 27: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 94.47 sec
```
* Relax time limit for CPU conv
```
Start 1: test_graph
1/29 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/29 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/29 Test 3: test_conv ........................ Passed 4.47 sec
Start 4: test_matmul
4/29 Test 4: test_matmul ...................... Passed 2.61 sec
Start 5: test_pooling
5/29 Test 5: test_pooling ..................... Passed 2.57 sec
Start 6: test_Interpreter
6/29 Test 6: test_Interpreter ................. Passed 4.35 sec
Start 7: test_OpSearch
7/29 Test 7: test_OpSearch .................... Passed 0.02 sec
Start 8: test_Rule2VariableMerging
8/29 Test 8: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 9: test_TConv2gemm
9/29 Test 9: test_TConv2gemm .................. Passed 23.32 sec
Start 10: test_as_tvm
10/29 Test 10: test_as_tvm ...................... Passed 0.02 sec
Start 11: test_compareFormulas
11/29 Test 11: test_compareFormulas ............. Passed 0.02 sec
Start 12: test_conv2conv
12/29 Test 12: test_conv2conv ................... Passed 32.12 sec
Start 13: test_conv2gemm
13/29 Test 13: test_conv2gemm ................... Passed 30.59 sec
Start 14: test_dlt
14/29 Test 14: test_dlt ......................... Passed 0.02 sec
Start 15: test_exprHash
15/29 Test 15: test_exprHash .................... Passed 0.01 sec
Start 16: test_g2bmm
16/29 Test 16: test_g2bmm ....................... Passed 0.14 sec
Start 17: test_guidedDLT
17/29 Test 17: test_guidedDLT ................... Passed 0.07 sec
Start 18: test_matchConv
18/29 Test 18: test_matchConv ................... Passed 0.02 sec
Start 19: test_matchElementWise
19/29 Test 19: test_matchElementWise ............ Passed 0.02 sec
Start 20: test_matchMatmul
20/29 Test 20: test_matchMatmul ................. Passed 0.02 sec
Start 21: test_matchReshape
21/29 Test 21: test_matchReshape ................ Passed 0.02 sec
Start 22: test_memboundOp
22/29 Test 22: test_memboundOp .................. Passed 0.02 sec
Start 23: test_mergeStage
23/29 Test 23: test_mergeStage .................. Passed 0.01 sec
Start 24: test_oobChecker
24/29 Test 24: test_oobChecker .................. Passed 0.02 sec
Start 25: test_rangeMagnify
25/29 Test 25: test_rangeMagnify ................ Passed 0.02 sec
Start 26: test_relaxation
26/29 Test 26: test_relaxation .................. Passed 0.02 sec
Start 27: test_serializer
27/29 Test 27: test_serializer .................. Passed 0.03 sec
Start 28: test_simplify
28/29 Test 28: test_simplify .................... Passed 0.02 sec
Start 29: test_subset
29/29 Test 29: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 29
Total Test time (real) = 100.65 sec
```
* Remove out-of-date tests
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-13 15:17:22 +08:00
wendy12022
13b7a2604b
ADD add/mul/sub/div/pow operators and CPU/CUDA kernels ( #26 )
...
Fix some
remove useless code.
add div/pow kernel
Add add/mul/sub operators.
fix cpu kernel.
add element wise kenerl for cuda.
ADD element wise operator.
2022-09-09 13:43:59 +08:00
Anmuliar
0409eafb5f
Operators g2bmm&gbmm transplantation ( #24 )
...
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Added operator G2BMM and corresponding testcase.
*Added files related to operator G2BMM creating&calling.
*Added custom_ops.cuh&custom_op.h.
* Add operator GBMML
* new version
* Fix: G2BMM and GBMM kernel bugs
* Added testcase of operator GBMML
* clang format
* Added cmake option REQUIRE_GCC9
* Delete redundent file
* Renamed class GBMML into GBMM
* clang format
* Reviewed.
* Added cudahostcompier option.
* Add: explicit CMAKE_CUDA_HOST_COMPILER
* Rename gbmm kernel
* Fix: nvcc warning in GBMM and G2BMM
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-08 21:31:35 +08:00
wendy12022
48293576c0
Add maxpool and avgpool operators ( #17 )
...
* ADD:maxpool&&avgpool operators.
add OperatorObj::getDType()
clang format
FIX:timeit API has changed.
* Fix: Tensor::getInputs is const method
* Chore
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-31 14:44:53 +08:00
zhengly123
93f86d3f4d
Simplify tensor transfer between CPU and CUDA ( #10 )
...
* Add: OP infers data type & Graph clones tensor
* Fix: vecToString format
* Add: static assert for Tensor methods
* Rename: getDataRawPtr -> getRawDataPtr
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-25 11:29:16 +08:00
zhengly123
9303ddda8e
Add Conv operator and naive CPU implemenation ( #5 )
...
* Add: Conv definition
* Add: tensor copy data from vector
* Add: CPU conv kernel
* Fix: replace Int32 with UInt32 in DataType
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-17 14:16:01 +08:00
zhengly123
a26890abce
Tensor hash and inferShape ( #4 )
...
* Refactor: operator hash and inferShape
* Add: hash without shape
* Add: inferShape interface for given input tensors
* Add: construct outputs in op ctor
* Add: comments for matmul
* Add: opType in AttrVector and WorkloadVector
* Chore: _graph -> graph in Op ctor
* Chore: change the "Node" suffix to "Obj"
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-15 15:08:56 +08:00
Liyan Zheng
2054b0eda4
Chore: rename getOpAttrs to getOpPerfKey
2022-08-09 15:34:28 +08:00
Liyan Zheng
8b685ae4a6
Update: OpAttrs -> OpPerfKey
2022-08-09 14:58:45 +08:00
Liyan Zheng
1205240218
Add: mutator abstract class
2022-08-08 15:54:17 +08:00