xgqdut2016
a7293c12ba
Add layer normalization ( #181 )
...
* - add layernorm kernel
* success:add layernorm kernel and test
* fix: remove unusalble comments
* fix: modify code as reviewer suggested
* debug,modified .cu and test
* optional bias support
* overloading function
* fix bug after merging; remove time constrain in conv test
---------
Co-authored-by: kilinchange <kilinchange@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-11-24 15:15:14 +08:00
wendy12022
86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. ( #61 )
...
* move memory format transformation to TensorObj
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl
* fix: fix bugs when rebasing from master
fix: fix bugs when rebasing from master
* fix: update api after rebasing
* fix: fix format; fix onnx import
* fix: fix clang-format
* [fix] fix conv_transpose test
* [fix] use stronger test case for transposed conv
* [fix] remove tensor memory format; fix mkl transpose conv
* [fix] add FIXME tag for op_timer python api
---------
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
YdrMaster
9db97eb212
refactor: 整合操作张量数据的方法
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-21 14:00:04 +08:00
wendy12022
a4d6426589
ADD: batch norm operator and cuda kernel. ( #44 )
...
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
zhengly123
2f8f706f1c
Fix CMake USE_CUDA ( #36 )
...
* Fix: build lib without cuda
* Chore: rename GBMM and G2BMM files
* Fix: seperate CUDA tests from operator tests
* Fix: CMake CMP0104
* Chore: fix typo
* Chore: remove unused headers
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00
zhengly123
172d03d6f2
Fix NNet tests after migration ( #27 )
...
* Fix: interpreter
```
4 - readlog (Failed)
8 - test_TConv2gemm (Failed)
11 - test_conv2conv (Failed)
12 - test_conv2gemm (Failed)
15 - test_g2bmm (Failed)
16 - test_guidedDLT (Subprocess aborted)
22 - test_mergeStage (Subprocess aborted)
```
* Exclude readlog from ctest
* Fix: change the path of logs
```
85% tests passed, 4 tests failed out of 27
Total Test time (real) = 100.69 sec
The following tests FAILED:
10 - test_conv2conv (Timeout)
11 - test_conv2gemm (Timeout)
15 - test_guidedDLT (Subprocess aborted)
21 - test_mergeStage (Subprocess aborted)
Errors while running CTest
```
- test_conv2conv 38529 ms total
- test_conv2gemm 37098 ms total
* Fix: test_mergeStage
* Fix: test_guidedDLT
```
Start 1: test_graph
1/27 Test #1 : test_graph ....................... Passed 0.05 sec
Start 2: test_hash
2/27 Test #2 : test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test #3 : test_conv ........................ Passed 4.98 sec
Start 4: test_Interpreter
4/27 Test #4 : test_Interpreter ................. Passed 6.30 sec
Start 5: test_OpSearch
5/27 Test #5 : test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test #6 : test_Rule2VariableMerging ........ Passed 0.03 sec
Start 7: test_TConv2gemm
7/27 Test #7 : test_TConv2gemm .................. Passed 29.45 sec
Start 8: test_as_tvm
8/27 Test #8 : test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test #9 : test_compareFormulas ............. Passed 0.02 sec
Start 10: test_conv2conv
10/27 Test #10 : test_conv2conv ................... Passed 36.55 sec
Start 11: test_conv2gemm
11/27 Test #11 : test_conv2gemm ................... Passed 39.70 sec
Start 12: test_dlt
12/27 Test #12 : test_dlt ......................... Passed 0.03 sec
Start 13: test_exprHash
13/27 Test #13 : test_exprHash .................... Passed 0.02 sec
Start 14: test_g2bmm
14/27 Test #14 : test_g2bmm ....................... Passed 0.16 sec
Start 15: test_guidedDLT
15/27 Test #15 : test_guidedDLT ................... Passed 0.07 sec
Start 16: test_matchConv
16/27 Test #16 : test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test #17 : test_matchElementWise ............ Passed 0.03 sec
Start 18: test_matchMatmul
18/27 Test #18 : test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test #19 : test_matchReshape ................ Passed 0.02 sec
Start 20: test_memboundOp
20/27 Test #20 : test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test #21 : test_mergeStage .................. Passed 0.02 sec
Start 22: test_oobChecker
22/27 Test #22 : test_oobChecker .................. Passed 0.02 sec
Start 23: test_rangeMagnify
23/27 Test #23 : test_rangeMagnify ................ Passed 0.02 sec
Start 24: test_relaxation
24/27 Test #24 : test_relaxation .................. Passed 0.02 sec
Start 25: test_serializer
25/27 Test #25 : test_serializer .................. Passed 0.03 sec
Start 26: test_simplify
26/27 Test #26 : test_simplify .................... Passed 0.02 sec
Start 27: test_subset
27/27 Test #27 : test_subset ...................... Passed 0.01 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 117.72 sec
```
* Fix: format
* Replace nnet:Ref with infini::Ref
```
Start 1: test_graph
1/27 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/27 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/27 Test 3: test_conv ........................ Passed 4.45 sec
Start 4: test_Interpreter
4/27 Test 4: test_Interpreter ................. Passed 4.37 sec
Start 5: test_OpSearch
5/27 Test 5: test_OpSearch .................... Passed 0.02 sec
Start 6: test_Rule2VariableMerging
6/27 Test 6: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 7: test_TConv2gemm
7/27 Test 7: test_TConv2gemm .................. Passed 23.40 sec
Start 8: test_as_tvm
8/27 Test 8: test_as_tvm ...................... Passed 0.02 sec
Start 9: test_compareFormulas
9/27 Test 9: test_compareFormulas ............. Passed 0.01 sec
Start 10: test_conv2conv
10/27 Test 10: test_conv2conv ................... Passed 32.28 sec
Start 11: test_conv2gemm
11/27 Test 11: test_conv2gemm ................... Passed 29.41 sec
Start 12: test_dlt
12/27 Test 12: test_dlt ......................... Passed 0.02 sec
Start 13: test_exprHash
13/27 Test 13: test_exprHash .................... Passed 0.01 sec
Start 14: test_g2bmm
14/27 Test 14: test_g2bmm ....................... Passed 0.14 sec
Start 15: test_guidedDLT
15/27 Test 15: test_guidedDLT ................... Passed 0.06 sec
Start 16: test_matchConv
16/27 Test 16: test_matchConv ................... Passed 0.02 sec
Start 17: test_matchElementWise
17/27 Test 17: test_matchElementWise ............ Passed 0.02 sec
Start 18: test_matchMatmul
18/27 Test 18: test_matchMatmul ................. Passed 0.02 sec
Start 19: test_matchReshape
19/27 Test 19: test_matchReshape ................ Passed 0.01 sec
Start 20: test_memboundOp
20/27 Test 20: test_memboundOp .................. Passed 0.02 sec
Start 21: test_mergeStage
21/27 Test 21: test_mergeStage .................. Passed 0.01 sec
Start 22: test_oobChecker
22/27 Test 22: test_oobChecker .................. Passed 0.01 sec
Start 23: test_rangeMagnify
23/27 Test 23: test_rangeMagnify ................ Passed 0.01 sec
Start 24: test_relaxation
24/27 Test 24: test_relaxation .................. Passed 0.01 sec
Start 25: test_serializer
25/27 Test 25: test_serializer .................. Passed 0.02 sec
Start 26: test_simplify
26/27 Test 26: test_simplify .................... Passed 0.01 sec
Start 27: test_subset
27/27 Test 27: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 27
Total Test time (real) = 94.47 sec
```
* Relax time limit for CPU conv
```
Start 1: test_graph
1/29 Test 1: test_graph ....................... Passed 0.02 sec
Start 2: test_hash
2/29 Test 2: test_hash ........................ Passed 0.02 sec
Start 3: test_conv
3/29 Test 3: test_conv ........................ Passed 4.47 sec
Start 4: test_matmul
4/29 Test 4: test_matmul ...................... Passed 2.61 sec
Start 5: test_pooling
5/29 Test 5: test_pooling ..................... Passed 2.57 sec
Start 6: test_Interpreter
6/29 Test 6: test_Interpreter ................. Passed 4.35 sec
Start 7: test_OpSearch
7/29 Test 7: test_OpSearch .................... Passed 0.02 sec
Start 8: test_Rule2VariableMerging
8/29 Test 8: test_Rule2VariableMerging ........ Passed 0.02 sec
Start 9: test_TConv2gemm
9/29 Test 9: test_TConv2gemm .................. Passed 23.32 sec
Start 10: test_as_tvm
10/29 Test 10: test_as_tvm ...................... Passed 0.02 sec
Start 11: test_compareFormulas
11/29 Test 11: test_compareFormulas ............. Passed 0.02 sec
Start 12: test_conv2conv
12/29 Test 12: test_conv2conv ................... Passed 32.12 sec
Start 13: test_conv2gemm
13/29 Test 13: test_conv2gemm ................... Passed 30.59 sec
Start 14: test_dlt
14/29 Test 14: test_dlt ......................... Passed 0.02 sec
Start 15: test_exprHash
15/29 Test 15: test_exprHash .................... Passed 0.01 sec
Start 16: test_g2bmm
16/29 Test 16: test_g2bmm ....................... Passed 0.14 sec
Start 17: test_guidedDLT
17/29 Test 17: test_guidedDLT ................... Passed 0.07 sec
Start 18: test_matchConv
18/29 Test 18: test_matchConv ................... Passed 0.02 sec
Start 19: test_matchElementWise
19/29 Test 19: test_matchElementWise ............ Passed 0.02 sec
Start 20: test_matchMatmul
20/29 Test 20: test_matchMatmul ................. Passed 0.02 sec
Start 21: test_matchReshape
21/29 Test 21: test_matchReshape ................ Passed 0.02 sec
Start 22: test_memboundOp
22/29 Test 22: test_memboundOp .................. Passed 0.02 sec
Start 23: test_mergeStage
23/29 Test 23: test_mergeStage .................. Passed 0.01 sec
Start 24: test_oobChecker
24/29 Test 24: test_oobChecker .................. Passed 0.02 sec
Start 25: test_rangeMagnify
25/29 Test 25: test_rangeMagnify ................ Passed 0.02 sec
Start 26: test_relaxation
26/29 Test 26: test_relaxation .................. Passed 0.02 sec
Start 27: test_serializer
27/29 Test 27: test_serializer .................. Passed 0.03 sec
Start 28: test_simplify
28/29 Test 28: test_simplify .................... Passed 0.02 sec
Start 29: test_subset
29/29 Test 29: test_subset ...................... Passed 0.00 sec
100% tests passed, 0 tests failed out of 29
Total Test time (real) = 100.65 sec
```
* Remove out-of-date tests
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-13 15:17:22 +08:00
Anmuliar
bd63f738dc
cuDNN conv tuning ( #16 )
...
* Function tune and corresponding testcase.
*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.
*Fix: A little bug of perfRecord using in /src/core/runtime.cc.
* Tune part debug
*Add: recover the code, fixed the commit error.
*Add: some anotations in tune function
* clang formmat test
* Fix: mem leak in CUDA Runtime and Conv
* Fix: sync in conv and default sync in timeit
* Change the way to tune operator conv.
Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.
* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess
* clang test
* clang-format
* clang-format bash.
* Chore: remove print and blank lines
Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-29 21:37:07 +08:00
Anmuliar
e076991f2f
Revert "Operator serialization ( #14 )" ( #15 )
...
This reverts commit 25f0c441d2
.
2022-08-29 16:02:48 +08:00
Anmuliar
25f0c441d2
Operator serialization ( #14 )
...
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.
*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.
*Add: testcase tune.
*Fix: a brief bug in CPU Runtime.
2022-08-29 15:59:03 +08:00
zhengly123
93f86d3f4d
Simplify tensor transfer between CPU and CUDA ( #10 )
...
* Add: OP infers data type & Graph clones tensor
* Fix: vecToString format
* Add: static assert for Tensor methods
* Rename: getDataRawPtr -> getRawDataPtr
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-25 11:29:16 +08:00
zhengly123
af08df32d2
Extended DataType class and Runtime interaction ( #9 )
...
* Add: DataType class
* Add: data-type-oblivious tensor interface
* Rename: copyBlobToCPU
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-23 16:55:59 +08:00
zhengly123
04ea5eed38
Add CUDA runtime ( #6 )
...
* Fix: add warm-up and repetition in timing
* Add: CUDA runtime and float support
* Refactor: Cuda and Cpu runtimes inherit Runtime
* Add: environment script for Lotus
* Add: Lotus build instructions
* Update README.md
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-22 15:01:03 +08:00
zhengly123
9303ddda8e
Add Conv operator and naive CPU implemenation ( #5 )
...
* Add: Conv definition
* Add: tensor copy data from vector
* Add: CPU conv kernel
* Fix: replace Int32 with UInt32 in DataType
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-17 14:16:01 +08:00