Commit Graph

20 Commits

Author SHA1 Message Date
wendy12022 86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
* move memory format transformation to TensorObj

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl

* fix: fix bugs when rebasing from master

fix: fix bugs when rebasing from master

* fix: update api after rebasing

* fix: fix format; fix onnx import

* fix: fix clang-format

* [fix] fix conv_transpose test

* [fix] use stronger test case for transposed conv

* [fix] remove tensor memory format; fix mkl transpose conv

* [fix] add FIXME tag for op_timer python api

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
YdrMaster 296fcc5aa0 feat: 创建 pyinfinitensor 前端
- python 前端项目结构及打包和安装脚本
- 后端编译出 so 改名为 backend,增加 GraphHandler 修改图结构
- ci 支持测试这些功能

Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-13 09:19:05 +08:00
Hardy b0c2a08252
Support bang c kernel wanghailu 0927 (#43)
* fix a little bug which found by new verison CMake

* add code for support BangC language kernel , just like Cuda kernel, not
library

* add bangc kernel

* support BangC kernel

* add code for support BangC kernel

* support bangc kernel

* fix some code from reviewer

* fix code of template fumction

* add code for support bangc kernel

* fix bangc format

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-30 11:01:52 +08:00
zhengly123 1aefc1b27e
Add python interface for CUDA operator evaluation (#42)
* Refactor: seperate data generator

* Add: python bindings for opTimer

* Fix: test_perfengine

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-27 10:41:12 +08:00
deathwings602 11d5aa1ccc
Add TVM codegen for MemboundOp (#35)
* Add:  interface for membound TVM kernel and test

* add getAnsorCode

* add evaluation, but link failed

* add evaluation of kernel, but link failed

* Fix: link libcuda and nvrtc

* add print

* Add: const for source of copy

* compile and evaluate the kernel

* add compute

* fix gen_ansor_op.py

* fix membound_TVM

* format and fix CMakeLists.txt

* fix memory leak

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2022-09-22 18:06:45 +08:00
Hardy c7c974f07a
Add bangc runtime and element-wise kernels
* add code for cambricon mlu, bang, cnnl

* add code for support cambricon mlu,cnnl,cnrt

* add code for support mlu

* add code for support cambricon cnnl

* add code for support mlu

* add code for mlu

* add code for mlu
`

* Update CMakeLists.txt

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
2022-09-22 16:57:39 +08:00
zhengly123 2f8f706f1c
Fix CMake USE_CUDA (#36)
* Fix: build lib without cuda

* Chore: rename GBMM and G2BMM files

* Fix: seperate CUDA tests from operator tests

* Fix: CMake CMP0104

* Chore: fix typo

* Chore: remove unused headers

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00
zhengly123 172d03d6f2
Fix NNet tests after migration (#27)
* Fix: interpreter

```
          4 - readlog (Failed)
          8 - test_TConv2gemm (Failed)
         11 - test_conv2conv (Failed)
         12 - test_conv2gemm (Failed)
         15 - test_g2bmm (Failed)
         16 - test_guidedDLT (Subprocess aborted)
         22 - test_mergeStage (Subprocess aborted)
```

* Exclude readlog from ctest

* Fix: change the path of logs

```
85% tests passed, 4 tests failed out of 27

Total Test time (real) = 100.69 sec

The following tests FAILED:
         10 - test_conv2conv (Timeout)
         11 - test_conv2gemm (Timeout)
         15 - test_guidedDLT (Subprocess aborted)
         21 - test_mergeStage (Subprocess aborted)
Errors while running CTest
```

- test_conv2conv 38529 ms total
- test_conv2gemm 37098 ms total

* Fix: test_mergeStage

* Fix: test_guidedDLT

```
      Start  1: test_graph
 1/27 Test  #1: test_graph .......................   Passed    0.05 sec
      Start  2: test_hash
 2/27 Test  #2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
 3/27 Test  #3: test_conv ........................   Passed    4.98 sec
      Start  4: test_Interpreter
 4/27 Test  #4: test_Interpreter .................   Passed    6.30 sec
      Start  5: test_OpSearch
 5/27 Test  #5: test_OpSearch ....................   Passed    0.02 sec
      Start  6: test_Rule2VariableMerging
 6/27 Test  #6: test_Rule2VariableMerging ........   Passed    0.03 sec
      Start  7: test_TConv2gemm
 7/27 Test  #7: test_TConv2gemm ..................   Passed   29.45 sec
      Start  8: test_as_tvm
 8/27 Test  #8: test_as_tvm ......................   Passed    0.02 sec
      Start  9: test_compareFormulas
 9/27 Test  #9: test_compareFormulas .............   Passed    0.02 sec
      Start 10: test_conv2conv
10/27 Test #10: test_conv2conv ...................   Passed   36.55 sec
      Start 11: test_conv2gemm
11/27 Test #11: test_conv2gemm ...................   Passed   39.70 sec
      Start 12: test_dlt
12/27 Test #12: test_dlt .........................   Passed    0.03 sec
      Start 13: test_exprHash
13/27 Test #13: test_exprHash ....................   Passed    0.02 sec
      Start 14: test_g2bmm
14/27 Test #14: test_g2bmm .......................   Passed    0.16 sec
      Start 15: test_guidedDLT
15/27 Test #15: test_guidedDLT ...................   Passed    0.07 sec
      Start 16: test_matchConv
16/27 Test #16: test_matchConv ...................   Passed    0.02 sec
      Start 17: test_matchElementWise
17/27 Test #17: test_matchElementWise ............   Passed    0.03 sec
      Start 18: test_matchMatmul
18/27 Test #18: test_matchMatmul .................   Passed    0.02 sec
      Start 19: test_matchReshape
19/27 Test #19: test_matchReshape ................   Passed    0.02 sec
      Start 20: test_memboundOp
20/27 Test #20: test_memboundOp ..................   Passed    0.02 sec
      Start 21: test_mergeStage
21/27 Test #21: test_mergeStage ..................   Passed    0.02 sec
      Start 22: test_oobChecker
22/27 Test #22: test_oobChecker ..................   Passed    0.02 sec
      Start 23: test_rangeMagnify
23/27 Test #23: test_rangeMagnify ................   Passed    0.02 sec
      Start 24: test_relaxation
24/27 Test #24: test_relaxation ..................   Passed    0.02 sec
      Start 25: test_serializer
25/27 Test #25: test_serializer ..................   Passed    0.03 sec
      Start 26: test_simplify
26/27 Test #26: test_simplify ....................   Passed    0.02 sec
      Start 27: test_subset
27/27 Test #27: test_subset ......................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 27

Total Test time (real) = 117.72 sec
```

* Fix: format

* Replace nnet:Ref with infini::Ref

```
      Start  1: test_graph
 1/27 Test   1: test_graph .......................   Passed    0.02 sec
      Start  2: test_hash
 2/27 Test   2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
  3/27 Test   3: test_conv ........................   Passed    4.45 sec
      Start  4: test_Interpreter
 4/27 Test   4: test_Interpreter .................   Passed    4.37 sec
      Start  5: test_OpSearch
 5/27 Test   5: test_OpSearch ....................   Passed    0.02 sec
      Start  6: test_Rule2VariableMerging
 6/27 Test   6: test_Rule2VariableMerging ........   Passed    0.02 sec
      Start  7: test_TConv2gemm
 7/27 Test   7: test_TConv2gemm ..................   Passed   23.40 sec
      Start  8: test_as_tvm
 8/27 Test   8: test_as_tvm ......................   Passed    0.02 sec
      Start  9: test_compareFormulas
 9/27 Test   9: test_compareFormulas .............   Passed    0.01 sec
      Start 10: test_conv2conv
10/27 Test  10: test_conv2conv ...................   Passed   32.28 sec
      Start 11: test_conv2gemm
11/27 Test  11: test_conv2gemm ...................   Passed   29.41 sec
      Start 12: test_dlt
12/27 Test  12: test_dlt .........................   Passed    0.02 sec
      Start 13: test_exprHash
13/27 Test  13: test_exprHash ....................   Passed    0.01 sec
      Start 14: test_g2bmm
14/27 Test  14: test_g2bmm .......................   Passed    0.14 sec
      Start 15: test_guidedDLT
15/27 Test  15: test_guidedDLT ...................   Passed    0.06 sec
      Start 16: test_matchConv
16/27 Test  16: test_matchConv ...................   Passed    0.02 sec
      Start 17: test_matchElementWise
17/27 Test  17: test_matchElementWise ............   Passed    0.02 sec
      Start 18: test_matchMatmul
18/27 Test  18: test_matchMatmul .................   Passed    0.02 sec
      Start 19: test_matchReshape
19/27 Test  19: test_matchReshape ................   Passed    0.01 sec
      Start 20: test_memboundOp
20/27 Test  20: test_memboundOp ..................   Passed    0.02 sec
      Start 21: test_mergeStage
21/27 Test  21: test_mergeStage ..................   Passed    0.01 sec
      Start 22: test_oobChecker
22/27 Test  22: test_oobChecker ..................   Passed    0.01 sec
      Start 23: test_rangeMagnify
23/27 Test  23: test_rangeMagnify ................   Passed    0.01 sec
      Start 24: test_relaxation
24/27 Test  24: test_relaxation ..................   Passed    0.01 sec
      Start 25: test_serializer
25/27 Test  25: test_serializer ..................   Passed    0.02 sec
      Start 26: test_simplify
26/27 Test  26: test_simplify ....................   Passed    0.01 sec
      Start 27: test_subset
27/27 Test  27: test_subset ......................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 27

Total Test time (real) =  94.47 sec
```

* Relax time limit for CPU conv

```
      Start  1: test_graph
 1/29 Test   1: test_graph .......................   Passed    0.02 sec
      Start  2: test_hash
 2/29 Test   2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
 3/29 Test   3: test_conv ........................   Passed    4.47 sec
      Start  4: test_matmul
 4/29 Test   4: test_matmul ......................   Passed    2.61 sec
      Start  5: test_pooling
 5/29 Test   5: test_pooling .....................   Passed    2.57 sec
      Start  6: test_Interpreter
 6/29 Test   6: test_Interpreter .................   Passed    4.35 sec
      Start  7: test_OpSearch
 7/29 Test   7: test_OpSearch ....................   Passed    0.02 sec
      Start  8: test_Rule2VariableMerging
 8/29 Test   8: test_Rule2VariableMerging ........   Passed    0.02 sec
      Start  9: test_TConv2gemm
 9/29 Test   9: test_TConv2gemm ..................   Passed   23.32 sec
      Start 10: test_as_tvm
10/29 Test  10: test_as_tvm ......................   Passed    0.02 sec
      Start 11: test_compareFormulas
11/29 Test  11: test_compareFormulas .............   Passed    0.02 sec
      Start 12: test_conv2conv
12/29 Test  12: test_conv2conv ...................   Passed   32.12 sec
      Start 13: test_conv2gemm
13/29 Test  13: test_conv2gemm ...................   Passed   30.59 sec
      Start 14: test_dlt
14/29 Test  14: test_dlt .........................   Passed    0.02 sec
      Start 15: test_exprHash
15/29 Test  15: test_exprHash ....................   Passed    0.01 sec
      Start 16: test_g2bmm
16/29 Test  16: test_g2bmm .......................   Passed    0.14 sec
      Start 17: test_guidedDLT
17/29 Test  17: test_guidedDLT ...................   Passed    0.07 sec
      Start 18: test_matchConv
18/29 Test  18: test_matchConv ...................   Passed    0.02 sec
      Start 19: test_matchElementWise
19/29 Test  19: test_matchElementWise ............   Passed    0.02 sec
      Start 20: test_matchMatmul
20/29 Test  20: test_matchMatmul .................   Passed    0.02 sec
      Start 21: test_matchReshape
21/29 Test  21: test_matchReshape ................   Passed    0.02 sec
      Start 22: test_memboundOp
22/29 Test  22: test_memboundOp ..................   Passed    0.02 sec
      Start 23: test_mergeStage
23/29 Test  23: test_mergeStage ..................   Passed    0.01 sec
      Start 24: test_oobChecker
24/29 Test  24: test_oobChecker ..................   Passed    0.02 sec
      Start 25: test_rangeMagnify
25/29 Test  25: test_rangeMagnify ................   Passed    0.02 sec
      Start 26: test_relaxation
26/29 Test  26: test_relaxation ..................   Passed    0.02 sec
      Start 27: test_serializer
27/29 Test  27: test_serializer ..................   Passed    0.03 sec
      Start 28: test_simplify
28/29 Test  28: test_simplify ....................   Passed    0.02 sec
      Start 29: test_subset
29/29 Test  29: test_subset ......................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 29

Total Test time (real) = 100.65 sec
```

* Remove out-of-date tests

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-13 15:17:22 +08:00
Hardy 03de74f4bc
Tensor serialization (#25)
* use protobuf for tensor data save,write,read, in chinese 序列化和反序列化

* add protobuf

* add code for tensor load & save from/to file

* add code for tensor laod & save

* add code for tensor load & save

* add code for tensor save & load

* add code for tensor save & load

* add code for save & load

* add code for load & save

* add code for tensor load & save

* add code for tensor save & load

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2022-09-13 11:27:41 +08:00
Anmuliar 0409eafb5f
Operators g2bmm&gbmm transplantation (#24)
* Function tune and corresponding testcase.

*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.

*Fix: A little bug of perfRecord using in /src/core/runtime.cc.

* Tune part debug

*Add: recover the code, fixed the commit error.

*Add: some anotations in tune function

* clang formmat test

* Fix: mem leak in CUDA Runtime and Conv

* Fix: sync in conv and default sync in timeit

* Change the way to tune operator conv.

Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.

* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess

* clang test

* clang-format

* clang-format bash.

* Added operator G2BMM and corresponding testcase.

*Added files related to operator G2BMM creating&calling.

*Added custom_ops.cuh&custom_op.h.

* Add operator GBMML

* new version

* Fix: G2BMM and GBMM kernel bugs

* Added testcase of operator GBMML

* clang format

* Added cmake option REQUIRE_GCC9

* Delete redundent file

* Renamed class GBMML into GBMM

* clang format

* Reviewed.

* Added cudahostcompier option.

* Add: explicit CMAKE_CUDA_HOST_COMPILER

* Rename gbmm kernel

* Fix: nvcc warning in GBMM and G2BMM

Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-08 21:31:35 +08:00
Hardy 32a01efbbe
add code for backtrace (#21)
* add code for backtrace

* Add: infini::Exception

```
Test project /home/zly/InfiniTensor_aux/build
    Start 1: test_graph
1/4 Test #1: test_graph .......................   Passed    0.05 sec
    Start 2: test_hash
2/4 Test #2: test_hash ........................   Passed    0.02 sec
    Start 3: test_conv
3/4 Test #3: test_conv ........................   Passed    4.40 sec
    Start 4: test_pooling
4/4 Test #4: test_pooling .....................   Passed    2.47 sec

100% tests passed, 0 tests failed out of 4

Total Test time (real) =   6.94 sec
```

* Fix: USE_BACKTRACE in cmake

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-01 20:30:12 +08:00
wendy12022 48293576c0
Add maxpool and avgpool operators (#17)
* ADD:maxpool&&avgpool operators.

add OperatorObj::getDType()

clang format

FIX:timeit API has changed.

* Fix: Tensor::getInputs is const method

* Chore

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-31 14:44:53 +08:00
Anmuliar e076991f2f
Revert "Operator serialization (#14)" (#15)
This reverts commit 25f0c441d2.
2022-08-29 16:02:48 +08:00
Anmuliar 25f0c441d2
Operator serialization (#14)
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.

*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.

*Add: testcase tune.

*Fix: a brief bug in CPU Runtime.
2022-08-29 15:59:03 +08:00
zhengly123 04ea5eed38
Add CUDA runtime (#6)
* Fix: add warm-up and repetition in timing

* Add: CUDA runtime and float support

* Refactor: Cuda and Cpu runtimes inherit Runtime

* Add: environment script for Lotus

* Add: Lotus build instructions

* Update README.md

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-22 15:01:03 +08:00
zhengly123 9303ddda8e
Add Conv operator and naive CPU implemenation (#5)
* Add: Conv definition

* Add: tensor copy data from vector

* Add: CPU conv kernel

* Fix: replace Int32 with UInt32 in DataType

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-17 14:16:01 +08:00
Liyan Zheng 8b685ae4a6 Update: OpAttrs -> OpPerfKey 2022-08-09 14:58:45 +08:00
Liyan Zheng b7e2096a26 Add: nnet code 2022-08-08 16:02:07 +08:00
Liyan Zheng 6c356d5b42 Add: kernel registry and naive Matmul kernel 2022-08-06 15:58:40 +08:00
Liyan Zheng e6101b0336 Add: graph, tensor, and operator 2022-07-31 21:44:03 +08:00