Commit Graph

61 Commits

Author SHA1 Message Date
Derui Yang 57ac94d893
refactor(core): 添加新的 `OpType` 定义 (#99)
* feat: 添加新的 OpType 定义

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* refactor: 使用新的 OpType 替换原来的,修改整个项目

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: onnx 导入

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正 cuda 和 bang kernel 的问题

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 过滤 bang test

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 过滤 bang test

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix bang code.

* fix code on bang

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 删除指定文件

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 删两个没用的文件,去掉一个不知道为什么的注释

Signed-off-by: YdrMaster <ydrml@hotmail.com>

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2023-08-07 11:17:05 +08:00
zhangyunze 9b10a74788
支持fp16 dtype (#96)
* add conv_half kernel

* Conv Kernel FP16

* dcj:
replace "DataType::Float32" with "op->getDType()" to support more DataType

* feat: support Float16 dtype

* fix: set default clang-format to 14 version

* fix: 按照review意见修改

* fix: add data convert to convfp16 kernel test

* test: add conv_fp16 kernel test

---------

Co-authored-by: zhangyue207 <zhangyue@qiyuanlab.com>
Co-authored-by: kilinchange <kilinchange@163.com>
2023-08-02 16:38:16 +08:00
Derui Yang 1dc65e2788
build: 实现格式化 git added c/c++ 源码的脚本 (#98)
* build: 实现格式化 git added c/c++ 源码的脚本

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 扩充 c 风格文件类型

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: format py files

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持从任意 commit 开始格式化所有修改的文件

Signed-off-by: YdrMaster <ydrml@hotmail.com>

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-07-21 12:29:50 +08:00
YdrMaster 26f0d13c26
Dev for 202303ddl (#66)
* add activation operatiopn relu, tanh, sigmoid on mlu

* commit for format

* add activation backward operation

* add test for activation_backward

* add test

* add convbpfilter

* fix

* add transpsoe code and test

* add trigon function operation on mlu: sin,cos,tan,asin,sinh,asinh

* add copy operation on mlu

* add ceil operation and floor operation

* add operation clip

* add operation cnnl div, test and test for divdemo bangc kernel

* add divnonan operation and test

* add erf operation

* add exp operation

* add operation fill

* add log operation

* add log1p operation

* add l2loss operation

* add maximum and minimum operation

* add mseloss operation

* add negTensor operation

* add power operation

* add reciprocal operation

* add sqrt and rsqrt operation

* add transform operation

* add addn operation

* add muln operation

* cherrry pick some operation

* add floordiv operation and floordivtrunc operation

* add floormod operation

* add cumsum operation

* add det operation

* add pad operation

* format

* add concat operation

* format

* add split operation

* fix concat and split operation

* add round operation

* add pooling operation

* add square operation

* add squaredDifference operation

* code format fix

* add flip operation

* code format fix

* add hardtanh operation

* add logic operation

* add addcdiv and addcmul operation

* add arange operation

* add bitcompute operation

* add net test

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* style: rename

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 用 NativeCpuRuntime 替换 CpuRuntime

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix code

* fix code

* fix code by review suggestion

* remove operation which is not the onnx operation

* fix format

* clang format

* refactor: tensor 的 print 加一层模板的 dataToString

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: onnx 导出

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 增加计算图优化接口

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add clip operation

* feat: 支持导入 clip

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* test: 导入导出测试加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix batch norm

* feat: 增加 Shape 算子

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入 unsqueeze

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正 clip 接口

feat: 支持导入 transpose
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add broadcast operation

* fix elementwise-broadcast

* fix elementwise broadcast

* add broadcast for gpu elementsie

* feat: pad 支持 axes 负数

feat: 不支持的 padding 导出为独立的 pad 算子

feat: 支持导入 onnxsim 过的 inception
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正池化的测试

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导出 pads,支持 inception 导入导出,已加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持 densenet 导入导出,并加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导入 squeeze

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix softmax

* feat: 导出 clip 和 transpose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持 Conv 的 bias

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: bias of conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: bias of conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导入 split

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导出 split

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: conv group

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: matmul 的 bias 没有放在输入里,修正

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix exmaple

* fix: 改正 reduce_mean 导出

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* refactor: 修改 slice 实现与 onnx 一致

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* style: 不导出两个 runtime 函数

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* doc: 中文使用指南

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* doc: 补全指南

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修复导入数据的问题

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 添加 Dropout 基本结构,但不支持两个输出是不同的类型

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 重新导出优化接口

feat: dropout 导入
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* build: BANG 选项加入 Makefile

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fxi code, change of test/kernels/bang/test* is use NativeCpuRuntime.
chaneg of include/bang/bang_runtime is for the cntoolkit upgrade.

* feat: 导出 bang runtime

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add USE_BANG=1

* fix matmul

* fix reshape

* fix

* fix activation

* fix transpose

* format

* format

* update Makefile

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入导出 ConvTranspose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add prelu on mlu

* fix: ConvTranspose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入导出 PRelu

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add convtrans on mlu

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* docs: 更新 README_CN.md

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix code by review suggestions

* style

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: Softmax 的 axis 可以用默认值?感觉是 onnx 不标准

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix cuda & intelcpu bugs after merging

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 15:10:33 +08:00
zhengly123 a1974aabcd
NNET supports TVM backend and kernels (#78)
* Add: mutator InfoGAN minimum test

* Add: cache and padding (bugs!!)

* Add: expression reader as a cmake target

* Fix: [Intermediate] NMutator::expressionToGraph

To be fix: matmul with implicit broadcast

* Add: matmul broadcast

* Fix: GraphObj ctor should use cloneTensor

* Fix: cuBLAS failure when codegen is enabled

* Add: Exception for checkCuError

* Fix: graph OpList ctor

* Add: expr simplication for TVM

* Add: TVM headers and CMake include paths

* Add: CMake config

* Add: PackedFunc (broken)

* Fix: remove cuCtxCreate which makes TVM fails

* Fix: membound_tvm

* Fix: test_memboundOp

* Add: PRelu Expr and AsTVMVisitor

* Add: Random generator

* Add: support TVM packed function

* Fix: specify runtime

* Add: CMake support of TVM

* Add: detailed output of Matmul

* Add: comments for Matmul

* Chore: format and comments

* Chore: GraphObj::selfCheck without assert control

* Fix: CMAKE_CXX_FLAGS in CMakeLists

* fix merge bug

* update api for mkl batchnorm test

* fix lotus env

* fig header bug

---------

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 00:26:36 +08:00
wendy12022 43d4798323
ADD: sub graph replacement. (#56)
reconfig: connections among op and tensor now is managered by GraphObj .

add some comments

merge from master

merge from master

ADD: sub graph replacement

reconfig inputs of op resize, due to the check of operator inputs.

ResizeObj::clone

clang format

fix some and add test for multi-output.

replacement support multi-inputs and multi-outputs.

add clone for all operators

add replaceSubGraph addSubGraph

remove extra code

add more test

remove extra print

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-04-17 13:09:07 +08:00
wendy12022 c8b2c8ed32
Cpu backend2 (#77)
fix review

change Device::MKL to Device::INTELCPU

fix mkl linkage

fix errors according to merge from master

now can call mkl backend

fix softmax/flatten with axis from onnx.

modify README.md

fix memory refree

add env_lotus_intelcpu.sh

fix compile

merge from branch cpu_backend

fix something add gather

fix something

FIX: directory rename from "mkl" to "intelcpu"

ADD: use oneMKL dpcpp interface to implement matmul kernel.

ADD: add dpcpp as compiler for mkl, and fix warnings for clang compiling.
add dpcpp kernel for pow.

ADD: mkl kernel for pad.

ADD: slice mkl kernel.

ADD: reshape/flatten/identity mkl kernel.

ADD: split mkl kernel.

fix compile error

FIX: fix flattenObj with axis.

ADD reduce_mean mkl kernel.

Add concat mkl kernel.

bathNorm for mkl kernel.

sigmoid mkl kernel.

ADD:add mkl kernel for pooling

add more tests for softmax

Now softmax cuda kernel supports any axises.

mkl kernel for softmax

softmax

add axis to softmax operator

add mkl kernel for abs tanh

ADD: relu kernel for mkl

fix binary mkl primitives.

add mkl kernel for binary operators

fix compiler error

move stream to runtime

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl
2023-04-17 12:15:23 +08:00
Hardy fe1afe38fa
fix code of bang conv (#76)
* fix code of bang conv

* test: 向 master push 时也执行 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
2023-03-29 15:47:32 +08:00
Hardy 823e66a9ff
Support perf bang 1115 (#57)
* support matmul

* add matmul

* add matmul

* add code for cnnl matmul operation and test

* add conv

* add code for conv test on mlu

* add code for test cnnl conv on mlu

* add code for perf conv and matmul on mlu

* clang format

* fix convolution operation

* fxi cmaklist

* code format

* fix code

* code format

---------

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
2023-03-29 13:52:56 +08:00
wendy12022 86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
* move memory format transformation to TensorObj

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl

* fix: fix bugs when rebasing from master

fix: fix bugs when rebasing from master

* fix: update api after rebasing

* fix: fix format; fix onnx import

* fix: fix clang-format

* [fix] fix conv_transpose test

* [fix] use stronger test case for transposed conv

* [fix] remove tensor memory format; fix mkl transpose conv

* [fix] add FIXME tag for op_timer python api

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
whjthu d9886e9de3 fix: remove inline keyword in class; rename getter and setter for inputOf and outputOf 2023-03-25 12:04:24 +08:00
YdrMaster 9db97eb212 refactor: 整合操作张量数据的方法
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-21 14:00:04 +08:00
YdrMaster a27391fcdc fix: 修正 batchNorm 实现
- onnx 和 pytorch 认为 batchNorm 的 4 个参数是 [c] 形状的,cuDNN 可能认为是 [1,c,1,...]。
优化已改为 [c],但 cuDNN 推理没有改;

Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-15 17:23:32 +08:00
YdrMaster 45a3cdfa30 feat: GraphObj 增加一个拓扑排序方法及其测试
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-03-15 15:09:12 +08:00
Haojie Wang 0f52d04882
Merge branch 'master' into dev-onnx 2023-03-15 14:52:03 +08:00
deathwings602 40d1b1c91b
Add ConvTransposedNHWC (#67)
* Add: IT_ASSERT_TODO

* [WIP] Add: ConvTranspose2d mutation test

* add ConvTransposedNHWC

* fix test_cuda_transposed_2d

---------

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2023-03-01 14:15:02 +08:00
YdrMaster a7e58bd8d0 feat: 补充 DataType 类型
- 增加了 6 个代数类型,与 onnx 的序号对应
- 现在可以导入 reshape 了

Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-14 11:27:57 +08:00
YdrMaster 296fcc5aa0 feat: 创建 pyinfinitensor 前端
- python 前端项目结构及打包和安装脚本
- 后端编译出 so 改名为 backend,增加 GraphHandler 修改图结构
- ci 支持测试这些功能

Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-02-13 09:19:05 +08:00
zhengly123 c7ec9ee6e7
Add search engine (#64)
* Add: tensor fuid

* [Intermediate state] Add: Graph ctor for OpVec

* Add: clone for operators

* tmp: search_engine

* search: init search Engine.

* Add: dummy mutator for the test of search engine

* search: add print graph.

* search: add partition.

* search: update comments.

* Fix: remain FUID in Tensor::clone

* Chore: rename GUidBaseType to UidBaseType

* Fix: connect NMutator to SearchEngine

* Chore: output

* Fix test_memboundOp: nmutator uses input runtime

* Chore: clang-format

* Chore: clang-format

* Fix: comments in the review

---------

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: mazx <dyxdy@live.com>
2023-02-12 18:27:52 +08:00
wendy12022 d780f687fc
ADD: reconfig ResizeObj, support "tf_crop_and_resize " and cubic coeff kernel. (#59)
add cubic coef

add tf_crop_and_resize
2022-12-24 04:02:21 +08:00
wendy12022 c5966f8d81
Add: resize operator and cuda kernel,support nearest/linear coef. (#51)
ADD: resize operator and cuda kernel,support nearest/linear coef.

fix some

fix tests

add more tests for linear mode.

add linear coef mode.

add scales

add tests

fix tests.

add notLarger notSmaller

fix

add test

ADD:resize operator and cuda kernel
2022-11-14 09:30:22 +08:00
Zixuan Ma 00b2f18c17
Fix: unsigned compare in test (#50)
fix: unsigned compare in test.

Test project /home/mazx/git/InfiniTensor/build
      Start  1: test_graph
 1/18 Test  #1: test_graph .......................   Passed    0.03 sec
      Start  2: test_hash
 2/18 Test  #2: test_hash ........................   Passed    0.01 sec
      Start  3: test_tensor_save
 3/18 Test  #3: test_tensor_save .................   Passed    0.02 sec
      Start  4: test_verify
 4/18 Test  #4: test_verify ......................   Passed    0.01 sec
      Start  5: test_batch_norm
 5/18 Test  #5: test_batch_norm ..................   Passed    0.01 sec
      Start  6: test_concat
 6/18 Test  #6: test_concat ......................   Passed    0.01 sec
      Start  7: test_conv
 7/18 Test  #7: test_conv ........................   Passed    0.24 sec
      Start  8: test_conv_transposed_2d
 8/18 Test  #8: test_conv_transposed_2d ..........   Passed    0.01 sec
      Start  9: test_element_wise
 9/18 Test  #9: test_element_wise ................   Passed    0.01 sec
      Start 10: test_extend
10/18 Test #10: test_extend ......................   Passed    0.01 sec
      Start 11: test_gather
11/18 Test #11: test_gather ......................   Passed    0.01 sec
      Start 12: test_matmul
12/18 Test #12: test_matmul ......................   Passed    0.01 sec
      Start 13: test_pad
13/18 Test #13: test_pad .........................   Passed    0.01 sec
      Start 14: test_pooling
14/18 Test #14: test_pooling .....................   Passed    0.01 sec
      Start 15: test_reduce_mean
15/18 Test #15: test_reduce_mean .................   Passed    0.01 sec
      Start 16: test_reshape
16/18 Test #16: test_reshape .....................   Passed    0.01 sec
      Start 17: test_slice
17/18 Test #17: test_slice .......................   Passed    0.01 sec
      Start 18: test_split
18/18 Test #18: test_split .......................   Passed    0.02 sec

100% tests passed, 0 tests failed out of 18
2022-10-19 15:03:03 +08:00
zhengly123 4e0040c8a0
Add: connection among tensors and operators (#45)
* Add: refs_to_wrefs and wrefs_to_refs

* Add: op and tensor connection

* Add: inception-v3 block test

* Refactor: addOperatorAndConnect

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-10-18 22:02:51 +08:00
wendy12022 d1c913010f
ADD:reduce_mean operator and cuda kernel. (#47)
add new line at file ending.
2022-10-15 16:53:58 +08:00
wendy12022 a4d6426589
ADD: batch norm operator and cuda kernel. (#44)
fix numInputs of batchNorm, add new line in file ending.

ADD: batch norm operator and cuda kernel.

add training

remove comments.

fix compile error.

add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
Hardy b0c2a08252
Support bang c kernel wanghailu 0927 (#43)
* fix a little bug which found by new verison CMake

* add code for support BangC language kernel , just like Cuda kernel, not
library

* add bangc kernel

* support BangC kernel

* add code for support BangC kernel

* support bangc kernel

* fix some code from reviewer

* fix code of template fumction

* add code for support bangc kernel

* fix bangc format

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-30 11:01:52 +08:00
wendy12022 26cee55e81
ADD:extend operator and cuda kernel. (#40)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 14:52:50 +08:00
wendy12022 fe14c91f54
ADD: Gather operator and cuda kernel. (#41)
fix a memory leak.

add tests.

ADD gather cuda kernel.

ADD gather operator

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 14:44:20 +08:00
wendy12022 3c6e208f42
ADD:concat/split operator and cuda kernels (#29)
* ADD:concat/split operator and cuda kernels

refector

minor change comment

ADD:concat/split operator and cuda kernels

merge split_kernel and concat_kernel to split_concat_kernel.

Revert "fix"

This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c.

fix

ADD:concat/split operator and cuda kernels

change whole tensor name to composed tensor

fix some

remove unused header.

rebase

add CudaKernel

add test for split.

ADD split operator and cuda kernel.

modify test.

ADD:concat operator and cuda kernel.

ADD:concat/split operator and cuda kernels

fix some

remove unused header.

rebase

add CudaKernel

ADD:concat/split operator and cuda kernels

add test for split.

ADD split operator and cuda kernel.

modify test.

ADD:concat operator and cuda kernel.

* remove extra comment; typo fix.

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 11:01:30 +08:00
wendy12022 5560d0f2fb
ADD:pad/slice operator and cuda kernel. (#39)
fix compile error

refector

clang format

split test.

fix compile error.

ADD slice cuda kernel.

ADD slice operator.

ADD:pad operator and cuda kernel.
2022-09-29 10:29:24 +08:00
zhengly123 1aefc1b27e
Add python interface for CUDA operator evaluation (#42)
* Refactor: seperate data generator

* Add: python bindings for opTimer

* Fix: test_perfengine

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-27 10:41:12 +08:00
deathwings602 11d5aa1ccc
Add TVM codegen for MemboundOp (#35)
* Add:  interface for membound TVM kernel and test

* add getAnsorCode

* add evaluation, but link failed

* add evaluation of kernel, but link failed

* Fix: link libcuda and nvrtc

* add print

* Add: const for source of copy

* compile and evaluate the kernel

* add compute

* fix gen_ansor_op.py

* fix membound_TVM

* format and fix CMakeLists.txt

* fix memory leak

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
2022-09-22 18:06:45 +08:00
Hardy c7c974f07a
Add bangc runtime and element-wise kernels
* add code for cambricon mlu, bang, cnnl

* add code for support cambricon mlu,cnnl,cnrt

* add code for support mlu

* add code for support cambricon cnnl

* add code for support mlu

* add code for mlu

* add code for mlu
`

* Update CMakeLists.txt

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
2022-09-22 16:57:39 +08:00
Anmuliar 90eb9d05a8
Json perfrecord (#32)
Added perfengine serialization&deserialization and corresponding test case.

* Add: perfrecord json representation.

* Add: perfrecord virtual func. to_json&from_json.

* Add: perfengine serilization and deserilization.

* Modify: tune func type to supp derived struct serilization.

* Fix: structure after rebase

* Chore: Remove empty line in conv.h

Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
2022-09-22 15:34:34 +08:00
wendy12022 9032cbb973
Add: reshape/flatten/identity OP and cuda kernel (#34)
* ADD:reshape/flatten/identity operators and cuda kernel.

fix: use cudaMemcpyAsync

clang format.

ADD flatten/identity operator.

add test for reshape.

ADD: reshape operator and cuda kernel.

* Fix: seperate CUDA tests & remove old header

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 14:04:30 +08:00
zhengly123 2f8f706f1c
Fix CMake USE_CUDA (#36)
* Fix: build lib without cuda

* Chore: rename GBMM and G2BMM files

* Fix: seperate CUDA tests from operator tests

* Fix: CMake CMP0104

* Chore: fix typo

* Chore: remove unused headers

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-21 12:28:00 +08:00
zhengly123 8f67a5cc76
Add: ConvTransposed (#33)
* Add: convTransposed2d operator

* Fix: IT_ASSERT namespace

* Add: nullptr check in as for Ref

* Fix: conv transpose operator and kernel

* Fix: makes PerfEngine singleton

* Add: ConvTransposed test

* Fix: rebase to master (PerfRecord shared_ptr)

* Revert: Ref with nullptr check

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-19 15:05:39 +08:00
Hardy 6ac106cba4
Add activation operators and kernels
* add code for activation operation

* add code for activation operation on GPU

* add test code for activation operation

* add code for activation operation

* add code for activation on gpu ,use cudnn

* add code for activation on GPU use cudnn

* Chore: add constants.h and remove comments

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-16 13:58:57 +08:00
zhengly123 172d03d6f2
Fix NNet tests after migration (#27)
* Fix: interpreter

```
          4 - readlog (Failed)
          8 - test_TConv2gemm (Failed)
         11 - test_conv2conv (Failed)
         12 - test_conv2gemm (Failed)
         15 - test_g2bmm (Failed)
         16 - test_guidedDLT (Subprocess aborted)
         22 - test_mergeStage (Subprocess aborted)
```

* Exclude readlog from ctest

* Fix: change the path of logs

```
85% tests passed, 4 tests failed out of 27

Total Test time (real) = 100.69 sec

The following tests FAILED:
         10 - test_conv2conv (Timeout)
         11 - test_conv2gemm (Timeout)
         15 - test_guidedDLT (Subprocess aborted)
         21 - test_mergeStage (Subprocess aborted)
Errors while running CTest
```

- test_conv2conv 38529 ms total
- test_conv2gemm 37098 ms total

* Fix: test_mergeStage

* Fix: test_guidedDLT

```
      Start  1: test_graph
 1/27 Test  #1: test_graph .......................   Passed    0.05 sec
      Start  2: test_hash
 2/27 Test  #2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
 3/27 Test  #3: test_conv ........................   Passed    4.98 sec
      Start  4: test_Interpreter
 4/27 Test  #4: test_Interpreter .................   Passed    6.30 sec
      Start  5: test_OpSearch
 5/27 Test  #5: test_OpSearch ....................   Passed    0.02 sec
      Start  6: test_Rule2VariableMerging
 6/27 Test  #6: test_Rule2VariableMerging ........   Passed    0.03 sec
      Start  7: test_TConv2gemm
 7/27 Test  #7: test_TConv2gemm ..................   Passed   29.45 sec
      Start  8: test_as_tvm
 8/27 Test  #8: test_as_tvm ......................   Passed    0.02 sec
      Start  9: test_compareFormulas
 9/27 Test  #9: test_compareFormulas .............   Passed    0.02 sec
      Start 10: test_conv2conv
10/27 Test #10: test_conv2conv ...................   Passed   36.55 sec
      Start 11: test_conv2gemm
11/27 Test #11: test_conv2gemm ...................   Passed   39.70 sec
      Start 12: test_dlt
12/27 Test #12: test_dlt .........................   Passed    0.03 sec
      Start 13: test_exprHash
13/27 Test #13: test_exprHash ....................   Passed    0.02 sec
      Start 14: test_g2bmm
14/27 Test #14: test_g2bmm .......................   Passed    0.16 sec
      Start 15: test_guidedDLT
15/27 Test #15: test_guidedDLT ...................   Passed    0.07 sec
      Start 16: test_matchConv
16/27 Test #16: test_matchConv ...................   Passed    0.02 sec
      Start 17: test_matchElementWise
17/27 Test #17: test_matchElementWise ............   Passed    0.03 sec
      Start 18: test_matchMatmul
18/27 Test #18: test_matchMatmul .................   Passed    0.02 sec
      Start 19: test_matchReshape
19/27 Test #19: test_matchReshape ................   Passed    0.02 sec
      Start 20: test_memboundOp
20/27 Test #20: test_memboundOp ..................   Passed    0.02 sec
      Start 21: test_mergeStage
21/27 Test #21: test_mergeStage ..................   Passed    0.02 sec
      Start 22: test_oobChecker
22/27 Test #22: test_oobChecker ..................   Passed    0.02 sec
      Start 23: test_rangeMagnify
23/27 Test #23: test_rangeMagnify ................   Passed    0.02 sec
      Start 24: test_relaxation
24/27 Test #24: test_relaxation ..................   Passed    0.02 sec
      Start 25: test_serializer
25/27 Test #25: test_serializer ..................   Passed    0.03 sec
      Start 26: test_simplify
26/27 Test #26: test_simplify ....................   Passed    0.02 sec
      Start 27: test_subset
27/27 Test #27: test_subset ......................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 27

Total Test time (real) = 117.72 sec
```

* Fix: format

* Replace nnet:Ref with infini::Ref

```
      Start  1: test_graph
 1/27 Test   1: test_graph .......................   Passed    0.02 sec
      Start  2: test_hash
 2/27 Test   2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
  3/27 Test   3: test_conv ........................   Passed    4.45 sec
      Start  4: test_Interpreter
 4/27 Test   4: test_Interpreter .................   Passed    4.37 sec
      Start  5: test_OpSearch
 5/27 Test   5: test_OpSearch ....................   Passed    0.02 sec
      Start  6: test_Rule2VariableMerging
 6/27 Test   6: test_Rule2VariableMerging ........   Passed    0.02 sec
      Start  7: test_TConv2gemm
 7/27 Test   7: test_TConv2gemm ..................   Passed   23.40 sec
      Start  8: test_as_tvm
 8/27 Test   8: test_as_tvm ......................   Passed    0.02 sec
      Start  9: test_compareFormulas
 9/27 Test   9: test_compareFormulas .............   Passed    0.01 sec
      Start 10: test_conv2conv
10/27 Test  10: test_conv2conv ...................   Passed   32.28 sec
      Start 11: test_conv2gemm
11/27 Test  11: test_conv2gemm ...................   Passed   29.41 sec
      Start 12: test_dlt
12/27 Test  12: test_dlt .........................   Passed    0.02 sec
      Start 13: test_exprHash
13/27 Test  13: test_exprHash ....................   Passed    0.01 sec
      Start 14: test_g2bmm
14/27 Test  14: test_g2bmm .......................   Passed    0.14 sec
      Start 15: test_guidedDLT
15/27 Test  15: test_guidedDLT ...................   Passed    0.06 sec
      Start 16: test_matchConv
16/27 Test  16: test_matchConv ...................   Passed    0.02 sec
      Start 17: test_matchElementWise
17/27 Test  17: test_matchElementWise ............   Passed    0.02 sec
      Start 18: test_matchMatmul
18/27 Test  18: test_matchMatmul .................   Passed    0.02 sec
      Start 19: test_matchReshape
19/27 Test  19: test_matchReshape ................   Passed    0.01 sec
      Start 20: test_memboundOp
20/27 Test  20: test_memboundOp ..................   Passed    0.02 sec
      Start 21: test_mergeStage
21/27 Test  21: test_mergeStage ..................   Passed    0.01 sec
      Start 22: test_oobChecker
22/27 Test  22: test_oobChecker ..................   Passed    0.01 sec
      Start 23: test_rangeMagnify
23/27 Test  23: test_rangeMagnify ................   Passed    0.01 sec
      Start 24: test_relaxation
24/27 Test  24: test_relaxation ..................   Passed    0.01 sec
      Start 25: test_serializer
25/27 Test  25: test_serializer ..................   Passed    0.02 sec
      Start 26: test_simplify
26/27 Test  26: test_simplify ....................   Passed    0.01 sec
      Start 27: test_subset
27/27 Test  27: test_subset ......................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 27

Total Test time (real) =  94.47 sec
```

* Relax time limit for CPU conv

```
      Start  1: test_graph
 1/29 Test   1: test_graph .......................   Passed    0.02 sec
      Start  2: test_hash
 2/29 Test   2: test_hash ........................   Passed    0.02 sec
      Start  3: test_conv
 3/29 Test   3: test_conv ........................   Passed    4.47 sec
      Start  4: test_matmul
 4/29 Test   4: test_matmul ......................   Passed    2.61 sec
      Start  5: test_pooling
 5/29 Test   5: test_pooling .....................   Passed    2.57 sec
      Start  6: test_Interpreter
 6/29 Test   6: test_Interpreter .................   Passed    4.35 sec
      Start  7: test_OpSearch
 7/29 Test   7: test_OpSearch ....................   Passed    0.02 sec
      Start  8: test_Rule2VariableMerging
 8/29 Test   8: test_Rule2VariableMerging ........   Passed    0.02 sec
      Start  9: test_TConv2gemm
 9/29 Test   9: test_TConv2gemm ..................   Passed   23.32 sec
      Start 10: test_as_tvm
10/29 Test  10: test_as_tvm ......................   Passed    0.02 sec
      Start 11: test_compareFormulas
11/29 Test  11: test_compareFormulas .............   Passed    0.02 sec
      Start 12: test_conv2conv
12/29 Test  12: test_conv2conv ...................   Passed   32.12 sec
      Start 13: test_conv2gemm
13/29 Test  13: test_conv2gemm ...................   Passed   30.59 sec
      Start 14: test_dlt
14/29 Test  14: test_dlt .........................   Passed    0.02 sec
      Start 15: test_exprHash
15/29 Test  15: test_exprHash ....................   Passed    0.01 sec
      Start 16: test_g2bmm
16/29 Test  16: test_g2bmm .......................   Passed    0.14 sec
      Start 17: test_guidedDLT
17/29 Test  17: test_guidedDLT ...................   Passed    0.07 sec
      Start 18: test_matchConv
18/29 Test  18: test_matchConv ...................   Passed    0.02 sec
      Start 19: test_matchElementWise
19/29 Test  19: test_matchElementWise ............   Passed    0.02 sec
      Start 20: test_matchMatmul
20/29 Test  20: test_matchMatmul .................   Passed    0.02 sec
      Start 21: test_matchReshape
21/29 Test  21: test_matchReshape ................   Passed    0.02 sec
      Start 22: test_memboundOp
22/29 Test  22: test_memboundOp ..................   Passed    0.02 sec
      Start 23: test_mergeStage
23/29 Test  23: test_mergeStage ..................   Passed    0.01 sec
      Start 24: test_oobChecker
24/29 Test  24: test_oobChecker ..................   Passed    0.02 sec
      Start 25: test_rangeMagnify
25/29 Test  25: test_rangeMagnify ................   Passed    0.02 sec
      Start 26: test_relaxation
26/29 Test  26: test_relaxation ..................   Passed    0.02 sec
      Start 27: test_serializer
27/29 Test  27: test_serializer ..................   Passed    0.03 sec
      Start 28: test_simplify
28/29 Test  28: test_simplify ....................   Passed    0.02 sec
      Start 29: test_subset
29/29 Test  29: test_subset ......................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 29

Total Test time (real) = 100.65 sec
```

* Remove out-of-date tests

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-13 15:17:22 +08:00
Hardy 03de74f4bc
Tensor serialization (#25)
* use protobuf for tensor data save,write,read, in chinese 序列化和反序列化

* add protobuf

* add code for tensor load & save from/to file

* add code for tensor laod & save

* add code for tensor load & save

* add code for tensor save & load

* add code for tensor save & load

* add code for save & load

* add code for load & save

* add code for tensor load & save

* add code for tensor save & load

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2022-09-13 11:27:41 +08:00
wendy12022 13b7a2604b
ADD add/mul/sub/div/pow operators and CPU/CUDA kernels (#26)
Fix some

remove useless code.

add div/pow kernel

Add add/mul/sub operators.

fix cpu kernel.

add element wise kenerl for cuda.

ADD element wise operator.
2022-09-09 13:43:59 +08:00
Anmuliar 0409eafb5f
Operators g2bmm&gbmm transplantation (#24)
* Function tune and corresponding testcase.

*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.

*Fix: A little bug of perfRecord using in /src/core/runtime.cc.

* Tune part debug

*Add: recover the code, fixed the commit error.

*Add: some anotations in tune function

* clang formmat test

* Fix: mem leak in CUDA Runtime and Conv

* Fix: sync in conv and default sync in timeit

* Change the way to tune operator conv.

Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.

* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess

* clang test

* clang-format

* clang-format bash.

* Added operator G2BMM and corresponding testcase.

*Added files related to operator G2BMM creating&calling.

*Added custom_ops.cuh&custom_op.h.

* Add operator GBMML

* new version

* Fix: G2BMM and GBMM kernel bugs

* Added testcase of operator GBMML

* clang format

* Added cmake option REQUIRE_GCC9

* Delete redundent file

* Renamed class GBMML into GBMM

* clang format

* Reviewed.

* Added cudahostcompier option.

* Add: explicit CMAKE_CUDA_HOST_COMPILER

* Rename gbmm kernel

* Fix: nvcc warning in GBMM and G2BMM

Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-08 21:31:35 +08:00
Hardy e1d43202d7
Verify wanghailu 0902 (#22)
* commit for verify, add some difference function

* add code for verify

* add code for verify

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2022-09-05 15:45:52 +08:00
wendy12022 c3bc278c12
Op matmul (#20)
ADD:add cuda kernel for matmul.

matmul tune

Add test_matmul.cc
2022-09-01 21:06:55 +08:00
Hardy 32a01efbbe
add code for backtrace (#21)
* add code for backtrace

* Add: infini::Exception

```
Test project /home/zly/InfiniTensor_aux/build
    Start 1: test_graph
1/4 Test #1: test_graph .......................   Passed    0.05 sec
    Start 2: test_hash
2/4 Test #2: test_hash ........................   Passed    0.02 sec
    Start 3: test_conv
3/4 Test #3: test_conv ........................   Passed    4.40 sec
    Start 4: test_pooling
4/4 Test #4: test_pooling .....................   Passed    2.47 sec

100% tests passed, 0 tests failed out of 4

Total Test time (real) =   6.94 sec
```

* Fix: USE_BACKTRACE in cmake

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-01 20:30:12 +08:00
wendy12022 48293576c0
Add maxpool and avgpool operators (#17)
* ADD:maxpool&&avgpool operators.

add OperatorObj::getDType()

clang format

FIX:timeit API has changed.

* Fix: Tensor::getInputs is const method

* Chore

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-31 14:44:53 +08:00
Anmuliar bd63f738dc
cuDNN conv tuning (#16)
* Function tune and corresponding testcase.

*Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv.

*Fix: A little bug of perfRecord using in /src/core/runtime.cc.

* Tune part debug

*Add: recover the code, fixed the commit error.

*Add: some anotations in tune function

* clang formmat test

* Fix: mem leak in CUDA Runtime and Conv

* Fix: sync in conv and default sync in timeit

* Change the way to tune operator conv.

Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward.

* Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess

* clang test

* clang-format

* clang-format bash.

* Chore: remove print and blank lines

Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-29 21:37:07 +08:00
Anmuliar e076991f2f
Revert "Operator serialization (#14)" (#15)
This reverts commit 25f0c441d2.
2022-08-29 16:02:48 +08:00
Anmuliar 25f0c441d2
Operator serialization (#14)
Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase.

*Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess.

*Add: testcase tune.

*Fix: a brief bug in CPU Runtime.
2022-08-29 15:59:03 +08:00
zhengly123 93f86d3f4d
Simplify tensor transfer between CPU and CUDA (#10)
* Add: OP infers data type  & Graph clones tensor

* Fix: vecToString format

* Add: static assert for Tensor methods

* Rename: getDataRawPtr -> getRawDataPtr

Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-08-25 11:29:16 +08:00