InfiniTensor

Commit Graph

Author	SHA1	Message	Date
Hardy	fe1afe38fa	fix code of bang conv (#76 ) * fix code of bang conv * test: 向 master push 时也执行 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: YdrMaster <ydrml@hotmail.com>	2023-03-29 15:47:32 +08:00
Hardy	823e66a9ff	Support perf bang 1115 (#57 ) * support matmul * add matmul * add matmul * add code for cnnl matmul operation and test * add conv * add code for conv test on mlu * add code for test cnnl conv on mlu * add code for perf conv and matmul on mlu * clang format * fix convolution operation * fxi cmaklist * code format * fix code * code format --------- Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: wanghailu <wanghailu0717@163.com>	2023-03-29 13:52:56 +08:00
wendy12022	86ec4036ce	ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61 ) * move memory format transformation to TensorObj clang format add MemoryFormat for tensorObj. use post_ops for fused conv/deconv Distinguish mkl op_timer from cuda op timer. add act optype to conv and deconv add operator timer add mkl kernel for convTransposed minor fix for group conv do not use cblas_sgemm_batch CpuRuntimeObj->NativeCpuRuntimeObj add matmul op for mkl * fix: fix bugs when rebasing from master fix: fix bugs when rebasing from master * fix: update api after rebasing * fix: fix format; fix onnx import * fix: fix clang-format * [fix] fix conv_transpose test * [fix] use stronger test case for transposed conv * [fix] remove tensor memory format; fix mkl transpose conv * [fix] add FIXME tag for op_timer python api --------- Co-authored-by: whjthu <haojie0429@gmail.com>	2023-03-27 21:28:49 +08:00
whjthu	d9886e9de3	fix: remove inline keyword in class; rename getter and setter for inputOf and outputOf	2023-03-25 12:04:24 +08:00
YdrMaster	9db97eb212	refactor: 整合操作张量数据的方法 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-21 14:00:04 +08:00
YdrMaster	a27391fcdc	fix: 修正 batchNorm 实现 - onnx 和 pytorch 认为 batchNorm 的 4 个参数是 [c] 形状的，cuDNN 可能认为是 [1,c,1,...]。优化已改为 [c]，但 cuDNN 推理没有改； Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 17:23:32 +08:00
YdrMaster	45a3cdfa30	feat: GraphObj 增加一个拓扑排序方法及其测试 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 15:09:12 +08:00
Haojie Wang	0f52d04882	Merge branch 'master' into dev-onnx	2023-03-15 14:52:03 +08:00
deathwings602	40d1b1c91b	Add ConvTransposedNHWC (#67 ) * Add: IT_ASSERT_TODO * [WIP] Add: ConvTranspose2d mutation test * add ConvTransposedNHWC * fix test_cuda_transposed_2d --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>	2023-03-01 14:15:02 +08:00
YdrMaster	a7e58bd8d0	feat: 补充 DataType 类型 - 增加了 6 个代数类型，与 onnx 的序号对应 - 现在可以导入 reshape 了 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 11:27:57 +08:00
YdrMaster	296fcc5aa0	feat: 创建 pyinfinitensor 前端 - python 前端项目结构及打包和安装脚本 - 后端编译出 so 改名为 backend，增加 GraphHandler 修改图结构 - ci 支持测试这些功能 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-13 09:19:05 +08:00
zhengly123	c7ec9ee6e7	Add search engine (#64 ) * Add: tensor fuid * [Intermediate state] Add: Graph ctor for OpVec * Add: clone for operators * tmp: search_engine * search: init search Engine. * Add: dummy mutator for the test of search engine * search: add print graph. * search: add partition. * search: update comments. * Fix: remain FUID in Tensor::clone * Chore: rename GUidBaseType to UidBaseType * Fix: connect NMutator to SearchEngine * Chore: output * Fix test_memboundOp: nmutator uses input runtime * Chore: clang-format * Chore: clang-format * Fix: comments in the review --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: mazx <dyxdy@live.com>	2023-02-12 18:27:52 +08:00
wendy12022	d780f687fc	ADD: reconfig ResizeObj, support "tf_crop_and_resize " and cubic coeff kernel. (#59 ) add cubic coef add tf_crop_and_resize	2022-12-24 04:02:21 +08:00
wendy12022	c5966f8d81	Add: resize operator and cuda kernel,support nearest/linear coef. (#51 ) ADD: resize operator and cuda kernel,support nearest/linear coef. fix some fix tests add more tests for linear mode. add linear coef mode. add scales add tests fix tests. add notLarger notSmaller fix add test ADD:resize operator and cuda kernel	2022-11-14 09:30:22 +08:00
Zixuan Ma	00b2f18c17	Fix: unsigned compare in test (#50 ) fix: unsigned compare in test. Test project /home/mazx/git/InfiniTensor/build Start 1: test_graph 1/18 Test #1: test_graph ....................... Passed 0.03 sec Start 2: test_hash 2/18 Test #2: test_hash ........................ Passed 0.01 sec Start 3: test_tensor_save 3/18 Test #3: test_tensor_save ................. Passed 0.02 sec Start 4: test_verify 4/18 Test #4: test_verify ...................... Passed 0.01 sec Start 5: test_batch_norm 5/18 Test #5: test_batch_norm .................. Passed 0.01 sec Start 6: test_concat 6/18 Test #6: test_concat ...................... Passed 0.01 sec Start 7: test_conv 7/18 Test #7: test_conv ........................ Passed 0.24 sec Start 8: test_conv_transposed_2d 8/18 Test #8: test_conv_transposed_2d .......... Passed 0.01 sec Start 9: test_element_wise 9/18 Test #9: test_element_wise ................ Passed 0.01 sec Start 10: test_extend 10/18 Test #10: test_extend ...................... Passed 0.01 sec Start 11: test_gather 11/18 Test #11: test_gather ...................... Passed 0.01 sec Start 12: test_matmul 12/18 Test #12: test_matmul ...................... Passed 0.01 sec Start 13: test_pad 13/18 Test #13: test_pad ......................... Passed 0.01 sec Start 14: test_pooling 14/18 Test #14: test_pooling ..................... Passed 0.01 sec Start 15: test_reduce_mean 15/18 Test #15: test_reduce_mean ................. Passed 0.01 sec Start 16: test_reshape 16/18 Test #16: test_reshape ..................... Passed 0.01 sec Start 17: test_slice 17/18 Test #17: test_slice ....................... Passed 0.01 sec Start 18: test_split 18/18 Test #18: test_split ....................... Passed 0.02 sec 100% tests passed, 0 tests failed out of 18	2022-10-19 15:03:03 +08:00
zhengly123	4e0040c8a0	Add: connection among tensors and operators (#45 ) * Add: refs_to_wrefs and wrefs_to_refs * Add: op and tensor connection * Add: inception-v3 block test * Refactor: addOperatorAndConnect Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-10-18 22:02:51 +08:00
wendy12022	d1c913010f	ADD:reduce_mean operator and cuda kernel. (#47 ) add new line at file ending.	2022-10-15 16:53:58 +08:00
wendy12022	a4d6426589	ADD: batch norm operator and cuda kernel. (#44 ) fix numInputs of batchNorm, add new line in file ending. ADD: batch norm operator and cuda kernel. add training remove comments. fix compile error. add batch norm operator and cuda kernel.	2022-10-15 16:29:28 +08:00
Hardy	b0c2a08252	Support bang c kernel wanghailu 0927 (#43 ) * fix a little bug which found by new verison CMake * add code for support BangC language kernel , just like Cuda kernel, not library * add bangc kernel * support BangC kernel * add code for support BangC kernel * support bangc kernel * fix some code from reviewer * fix code of template fumction * add code for support bangc kernel * fix bangc format Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-30 11:01:52 +08:00
wendy12022	26cee55e81	ADD:extend operator and cuda kernel. (#40 ) Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 14:52:50 +08:00
wendy12022	fe14c91f54	ADD: Gather operator and cuda kernel. (#41 ) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 14:44:20 +08:00
wendy12022	3c6e208f42	ADD:concat/split operator and cuda kernels (#29 ) * ADD:concat/split operator and cuda kernels refector minor change comment ADD:concat/split operator and cuda kernels merge split_kernel and concat_kernel to split_concat_kernel. Revert "fix" This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c. fix ADD:concat/split operator and cuda kernels change whole tensor name to composed tensor fix some remove unused header. rebase add CudaKernel add test for split. ADD split operator and cuda kernel. modify test. ADD:concat operator and cuda kernel. ADD:concat/split operator and cuda kernels fix some remove unused header. rebase add CudaKernel ADD:concat/split operator and cuda kernels add test for split. ADD split operator and cuda kernel. modify test. ADD:concat operator and cuda kernel. * remove extra comment; typo fix. Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 11:01:30 +08:00
wendy12022	5560d0f2fb	ADD:pad/slice operator and cuda kernel. (#39 ) fix compile error refector clang format split test. fix compile error. ADD slice cuda kernel. ADD slice operator. ADD:pad operator and cuda kernel.	2022-09-29 10:29:24 +08:00
zhengly123	1aefc1b27e	Add python interface for CUDA operator evaluation (#42 ) * Refactor: seperate data generator * Add: python bindings for opTimer * Fix: test_perfengine Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-27 10:41:12 +08:00
deathwings602	11d5aa1ccc	Add TVM codegen for MemboundOp (#35 ) * Add: interface for membound TVM kernel and test * add getAnsorCode * add evaluation, but link failed * add evaluation of kernel, but link failed * Fix: link libcuda and nvrtc * add print * Add: const for source of copy * compile and evaluate the kernel * add compute * fix gen_ansor_op.py * fix membound_TVM * format and fix CMakeLists.txt * fix memory leak Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>	2022-09-22 18:06:45 +08:00
Hardy	c7c974f07a	Add bangc runtime and element-wise kernels * add code for cambricon mlu, bang, cnnl * add code for support cambricon mlu,cnnl,cnrt * add code for support mlu * add code for support cambricon cnnl * add code for support mlu * add code for mlu * add code for mlu ` * Update CMakeLists.txt Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: zhengly123 <zhengly123@outlook.com>	2022-09-22 16:57:39 +08:00
Anmuliar	90eb9d05a8	Json perfrecord (#32 ) Added perfengine serialization&deserialization and corresponding test case. * Add: perfrecord json representation. * Add: perfrecord virtual func. to_json&from_json. * Add: perfengine serilization and deserilization. * Modify: tune func type to supp derived struct serilization. * Fix: structure after rebase * Chore: Remove empty line in conv.h Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: zhengly123 <zhengly123@outlook.com>	2022-09-22 15:34:34 +08:00
wendy12022	9032cbb973	Add: reshape/flatten/identity OP and cuda kernel (#34 ) * ADD:reshape/flatten/identity operators and cuda kernel. fix: use cudaMemcpyAsync clang format. ADD flatten/identity operator. add test for reshape. ADD: reshape operator and cuda kernel. * Fix: seperate CUDA tests & remove old header Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-21 14:04:30 +08:00
zhengly123	2f8f706f1c	Fix CMake USE_CUDA (#36 ) * Fix: build lib without cuda * Chore: rename GBMM and G2BMM files * Fix: seperate CUDA tests from operator tests * Fix: CMake CMP0104 * Chore: fix typo * Chore: remove unused headers Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-21 12:28:00 +08:00
zhengly123	8f67a5cc76	Add: ConvTransposed (#33 ) * Add: convTransposed2d operator * Fix: IT_ASSERT namespace * Add: nullptr check in as for Ref * Fix: conv transpose operator and kernel * Fix: makes PerfEngine singleton * Add: ConvTransposed test * Fix: rebase to master (PerfRecord shared_ptr) * Revert: Ref with nullptr check Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-19 15:05:39 +08:00
Hardy	6ac106cba4	Add activation operators and kernels * add code for activation operation * add code for activation operation on GPU * add test code for activation operation * add code for activation operation * add code for activation on gpu ,use cudnn * add code for activation on GPU use cudnn * Chore: add constants.h and remove comments Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-16 13:58:57 +08:00
zhengly123	172d03d6f2	Fix NNet tests after migration (#27 ) * Fix: interpreter ``` 4 - readlog (Failed) 8 - test_TConv2gemm (Failed) 11 - test_conv2conv (Failed) 12 - test_conv2gemm (Failed) 15 - test_g2bmm (Failed) 16 - test_guidedDLT (Subprocess aborted) 22 - test_mergeStage (Subprocess aborted) ``` * Exclude readlog from ctest * Fix: change the path of logs ``` 85% tests passed, 4 tests failed out of 27 Total Test time (real) = 100.69 sec The following tests FAILED: 10 - test_conv2conv (Timeout) 11 - test_conv2gemm (Timeout) 15 - test_guidedDLT (Subprocess aborted) 21 - test_mergeStage (Subprocess aborted) Errors while running CTest ``` - test_conv2conv 38529 ms total - test_conv2gemm 37098 ms total * Fix: test_mergeStage * Fix: test_guidedDLT ``` Start 1: test_graph 1/27 Test #1: test_graph ....................... Passed 0.05 sec Start 2: test_hash 2/27 Test #2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/27 Test #3: test_conv ........................ Passed 4.98 sec Start 4: test_Interpreter 4/27 Test #4: test_Interpreter ................. Passed 6.30 sec Start 5: test_OpSearch 5/27 Test #5: test_OpSearch .................... Passed 0.02 sec Start 6: test_Rule2VariableMerging 6/27 Test #6: test_Rule2VariableMerging ........ Passed 0.03 sec Start 7: test_TConv2gemm 7/27 Test #7: test_TConv2gemm .................. Passed 29.45 sec Start 8: test_as_tvm 8/27 Test #8: test_as_tvm ...................... Passed 0.02 sec Start 9: test_compareFormulas 9/27 Test #9: test_compareFormulas ............. Passed 0.02 sec Start 10: test_conv2conv 10/27 Test #10: test_conv2conv ................... Passed 36.55 sec Start 11: test_conv2gemm 11/27 Test #11: test_conv2gemm ................... Passed 39.70 sec Start 12: test_dlt 12/27 Test #12: test_dlt ......................... Passed 0.03 sec Start 13: test_exprHash 13/27 Test #13: test_exprHash .................... Passed 0.02 sec Start 14: test_g2bmm 14/27 Test #14: test_g2bmm ....................... Passed 0.16 sec Start 15: test_guidedDLT 15/27 Test #15: test_guidedDLT ................... Passed 0.07 sec Start 16: test_matchConv 16/27 Test #16: test_matchConv ................... Passed 0.02 sec Start 17: test_matchElementWise 17/27 Test #17: test_matchElementWise ............ Passed 0.03 sec Start 18: test_matchMatmul 18/27 Test #18: test_matchMatmul ................. Passed 0.02 sec Start 19: test_matchReshape 19/27 Test #19: test_matchReshape ................ Passed 0.02 sec Start 20: test_memboundOp 20/27 Test #20: test_memboundOp .................. Passed 0.02 sec Start 21: test_mergeStage 21/27 Test #21: test_mergeStage .................. Passed 0.02 sec Start 22: test_oobChecker 22/27 Test #22: test_oobChecker .................. Passed 0.02 sec Start 23: test_rangeMagnify 23/27 Test #23: test_rangeMagnify ................ Passed 0.02 sec Start 24: test_relaxation 24/27 Test #24: test_relaxation .................. Passed 0.02 sec Start 25: test_serializer 25/27 Test #25: test_serializer .................. Passed 0.03 sec Start 26: test_simplify 26/27 Test #26: test_simplify .................... Passed 0.02 sec Start 27: test_subset 27/27 Test #27: test_subset ...................... Passed 0.01 sec 100% tests passed, 0 tests failed out of 27 Total Test time (real) = 117.72 sec ``` * Fix: format * Replace nnet:Ref with infini::Ref ``` Start 1: test_graph 1/27 Test 1: test_graph ....................... Passed 0.02 sec Start 2: test_hash 2/27 Test 2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/27 Test 3: test_conv ........................ Passed 4.45 sec Start 4: test_Interpreter 4/27 Test 4: test_Interpreter ................. Passed 4.37 sec Start 5: test_OpSearch 5/27 Test 5: test_OpSearch .................... Passed 0.02 sec Start 6: test_Rule2VariableMerging 6/27 Test 6: test_Rule2VariableMerging ........ Passed 0.02 sec Start 7: test_TConv2gemm 7/27 Test 7: test_TConv2gemm .................. Passed 23.40 sec Start 8: test_as_tvm 8/27 Test 8: test_as_tvm ...................... Passed 0.02 sec Start 9: test_compareFormulas 9/27 Test 9: test_compareFormulas ............. Passed 0.01 sec Start 10: test_conv2conv 10/27 Test 10: test_conv2conv ................... Passed 32.28 sec Start 11: test_conv2gemm 11/27 Test 11: test_conv2gemm ................... Passed 29.41 sec Start 12: test_dlt 12/27 Test 12: test_dlt ......................... Passed 0.02 sec Start 13: test_exprHash 13/27 Test 13: test_exprHash .................... Passed 0.01 sec Start 14: test_g2bmm 14/27 Test 14: test_g2bmm ....................... Passed 0.14 sec Start 15: test_guidedDLT 15/27 Test 15: test_guidedDLT ................... Passed 0.06 sec Start 16: test_matchConv 16/27 Test 16: test_matchConv ................... Passed 0.02 sec Start 17: test_matchElementWise 17/27 Test 17: test_matchElementWise ............ Passed 0.02 sec Start 18: test_matchMatmul 18/27 Test 18: test_matchMatmul ................. Passed 0.02 sec Start 19: test_matchReshape 19/27 Test 19: test_matchReshape ................ Passed 0.01 sec Start 20: test_memboundOp 20/27 Test 20: test_memboundOp .................. Passed 0.02 sec Start 21: test_mergeStage 21/27 Test 21: test_mergeStage .................. Passed 0.01 sec Start 22: test_oobChecker 22/27 Test 22: test_oobChecker .................. Passed 0.01 sec Start 23: test_rangeMagnify 23/27 Test 23: test_rangeMagnify ................ Passed 0.01 sec Start 24: test_relaxation 24/27 Test 24: test_relaxation .................. Passed 0.01 sec Start 25: test_serializer 25/27 Test 25: test_serializer .................. Passed 0.02 sec Start 26: test_simplify 26/27 Test 26: test_simplify .................... Passed 0.01 sec Start 27: test_subset 27/27 Test 27: test_subset ...................... Passed 0.00 sec 100% tests passed, 0 tests failed out of 27 Total Test time (real) = 94.47 sec ``` * Relax time limit for CPU conv ``` Start 1: test_graph 1/29 Test 1: test_graph ....................... Passed 0.02 sec Start 2: test_hash 2/29 Test 2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/29 Test 3: test_conv ........................ Passed 4.47 sec Start 4: test_matmul 4/29 Test 4: test_matmul ...................... Passed 2.61 sec Start 5: test_pooling 5/29 Test 5: test_pooling ..................... Passed 2.57 sec Start 6: test_Interpreter 6/29 Test 6: test_Interpreter ................. Passed 4.35 sec Start 7: test_OpSearch 7/29 Test 7: test_OpSearch .................... Passed 0.02 sec Start 8: test_Rule2VariableMerging 8/29 Test 8: test_Rule2VariableMerging ........ Passed 0.02 sec Start 9: test_TConv2gemm 9/29 Test 9: test_TConv2gemm .................. Passed 23.32 sec Start 10: test_as_tvm 10/29 Test 10: test_as_tvm ...................... Passed 0.02 sec Start 11: test_compareFormulas 11/29 Test 11: test_compareFormulas ............. Passed 0.02 sec Start 12: test_conv2conv 12/29 Test 12: test_conv2conv ................... Passed 32.12 sec Start 13: test_conv2gemm 13/29 Test 13: test_conv2gemm ................... Passed 30.59 sec Start 14: test_dlt 14/29 Test 14: test_dlt ......................... Passed 0.02 sec Start 15: test_exprHash 15/29 Test 15: test_exprHash .................... Passed 0.01 sec Start 16: test_g2bmm 16/29 Test 16: test_g2bmm ....................... Passed 0.14 sec Start 17: test_guidedDLT 17/29 Test 17: test_guidedDLT ................... Passed 0.07 sec Start 18: test_matchConv 18/29 Test 18: test_matchConv ................... Passed 0.02 sec Start 19: test_matchElementWise 19/29 Test 19: test_matchElementWise ............ Passed 0.02 sec Start 20: test_matchMatmul 20/29 Test 20: test_matchMatmul ................. Passed 0.02 sec Start 21: test_matchReshape 21/29 Test 21: test_matchReshape ................ Passed 0.02 sec Start 22: test_memboundOp 22/29 Test 22: test_memboundOp .................. Passed 0.02 sec Start 23: test_mergeStage 23/29 Test 23: test_mergeStage .................. Passed 0.01 sec Start 24: test_oobChecker 24/29 Test 24: test_oobChecker .................. Passed 0.02 sec Start 25: test_rangeMagnify 25/29 Test 25: test_rangeMagnify ................ Passed 0.02 sec Start 26: test_relaxation 26/29 Test 26: test_relaxation .................. Passed 0.02 sec Start 27: test_serializer 27/29 Test 27: test_serializer .................. Passed 0.03 sec Start 28: test_simplify 28/29 Test 28: test_simplify .................... Passed 0.02 sec Start 29: test_subset 29/29 Test 29: test_subset ...................... Passed 0.00 sec 100% tests passed, 0 tests failed out of 29 Total Test time (real) = 100.65 sec ``` * Remove out-of-date tests Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-13 15:17:22 +08:00
Hardy	03de74f4bc	Tensor serialization (#25 ) * use protobuf for tensor data save,write,read, in chinese 序列化和反序列化 * add protobuf * add code for tensor load & save from/to file * add code for tensor laod & save * add code for tensor load & save * add code for tensor save & load * add code for tensor save & load * add code for save & load * add code for load & save * add code for tensor load & save * add code for tensor save & load Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2022-09-13 11:27:41 +08:00
wendy12022	13b7a2604b	ADD add/mul/sub/div/pow operators and CPU/CUDA kernels (#26 ) Fix some remove useless code. add div/pow kernel Add add/mul/sub operators. fix cpu kernel. add element wise kenerl for cuda. ADD element wise operator.	2022-09-09 13:43:59 +08:00
Anmuliar	0409eafb5f	Operators g2bmm&gbmm transplantation (#24 ) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-08 21:31:35 +08:00
Hardy	e1d43202d7	Verify wanghailu 0902 (#22 ) * commit for verify, add some difference function * add code for verify * add code for verify Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2022-09-05 15:45:52 +08:00
wendy12022	c3bc278c12	Op matmul (#20 ) ADD:add cuda kernel for matmul. matmul tune Add test_matmul.cc	2022-09-01 21:06:55 +08:00
Hardy	32a01efbbe	add code for backtrace (#21 ) * add code for backtrace * Add: infini::Exception ``` Test project /home/zly/InfiniTensor_aux/build Start 1: test_graph 1/4 Test #1: test_graph ....................... Passed 0.05 sec Start 2: test_hash 2/4 Test #2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/4 Test #3: test_conv ........................ Passed 4.40 sec Start 4: test_pooling 4/4 Test #4: test_pooling ..................... Passed 2.47 sec 100% tests passed, 0 tests failed out of 4 Total Test time (real) = 6.94 sec ``` * Fix: USE_BACKTRACE in cmake Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-01 20:30:12 +08:00
wendy12022	48293576c0	Add maxpool and avgpool operators (#17 ) * ADD:maxpool&&avgpool operators. add OperatorObj::getDType() clang format FIX:timeit API has changed. * Fix: Tensor::getInputs is const method * Chore Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-31 14:44:53 +08:00
Anmuliar	bd63f738dc	cuDNN conv tuning (#16 ) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Chore: remove print and blank lines Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-29 21:37:07 +08:00
Anmuliar	e076991f2f	Revert "Operator serialization (#14 )" (#15 ) This reverts commit `25f0c441d2`.	2022-08-29 16:02:48 +08:00
Anmuliar	25f0c441d2	Operator serialization (#14 ) Class "Cuda Runtime" fulfills function "tune" and adds corresponding testcase. Add: convCudnn::tune, convCudnn::cuDNNdescriptorAccess. Add: testcase tune. *Fix: a brief bug in CPU Runtime.	2022-08-29 15:59:03 +08:00
zhengly123	93f86d3f4d	Simplify tensor transfer between CPU and CUDA (#10 ) * Add: OP infers data type & Graph clones tensor * Fix: vecToString format * Add: static assert for Tensor methods * Rename: getDataRawPtr -> getRawDataPtr Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-25 11:29:16 +08:00
zhengly123	af08df32d2	Extended DataType class and Runtime interaction (#9 ) * Add: DataType class * Add: data-type-oblivious tensor interface * Rename: copyBlobToCPU Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-23 16:55:59 +08:00
zhengly123	04ea5eed38	Add CUDA runtime (#6 ) * Fix: add warm-up and repetition in timing * Add: CUDA runtime and float support * Refactor: Cuda and Cpu runtimes inherit Runtime * Add: environment script for Lotus * Add: Lotus build instructions * Update README.md Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-22 15:01:03 +08:00
zhengly123	9303ddda8e	Add Conv operator and naive CPU implemenation (#5 ) * Add: Conv definition * Add: tensor copy data from vector * Add: CPU conv kernel * Fix: replace Int32 with UInt32 in DataType Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-17 14:16:01 +08:00
zhengly123	a26890abce	Tensor hash and inferShape (#4 ) * Refactor: operator hash and inferShape * Add: hash without shape * Add: inferShape interface for given input tensors * Add: construct outputs in op ctor * Add: comments for matmul * Add: opType in AttrVector and WorkloadVector * Chore: _graph -> graph in Op ctor * Chore: change the "Node" suffix to "Obj" Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-15 15:08:56 +08:00
Liyan Zheng	ce5d49c79b	Add: clang format script	2022-08-09 19:50:23 +08:00
Liyan Zheng	b7e2096a26	Add: nnet code	2022-08-08 16:02:07 +08:00
Liyan Zheng	1205240218	Add: mutator abstract class	2022-08-08 15:54:17 +08:00

1 2

54 Commits