InfiniTensor

Commit Graph

Author	SHA1	Message	Date
wendy12022	c8b2c8ed32	Cpu backend2 (#77 ) fix review change Device::MKL to Device::INTELCPU fix mkl linkage fix errors according to merge from master now can call mkl backend fix softmax/flatten with axis from onnx. modify README.md fix memory refree add env_lotus_intelcpu.sh fix compile merge from branch cpu_backend fix something add gather fix something FIX: directory rename from "mkl" to "intelcpu" ADD: use oneMKL dpcpp interface to implement matmul kernel. ADD: add dpcpp as compiler for mkl, and fix warnings for clang compiling. add dpcpp kernel for pow. ADD: mkl kernel for pad. ADD: slice mkl kernel. ADD: reshape/flatten/identity mkl kernel. ADD: split mkl kernel. fix compile error FIX: fix flattenObj with axis. ADD reduce_mean mkl kernel. Add concat mkl kernel. bathNorm for mkl kernel. sigmoid mkl kernel. ADD：add mkl kernel for pooling add more tests for softmax Now softmax cuda kernel supports any axises. mkl kernel for softmax softmax add axis to softmax operator add mkl kernel for abs tanh ADD: relu kernel for mkl fix binary mkl primitives. add mkl kernel for binary operators fix compiler error move stream to runtime clang format add MemoryFormat for tensorObj. use post_ops for fused conv/deconv Distinguish mkl op_timer from cuda op timer. add act optype to conv and deconv add operator timer add mkl kernel for convTransposed minor fix for group conv do not use cblas_sgemm_batch CpuRuntimeObj->NativeCpuRuntimeObj add matmul op for mkl	2023-04-17 12:15:23 +08:00
whjthu	d9886e9de3	fix: remove inline keyword in class; rename getter and setter for inputOf and outputOf	2023-03-25 12:04:24 +08:00
YdrMaster	e294e46436	feat: 导出 pool 到 onnx Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 17:23:32 +08:00
YdrMaster	a5e692baea	feat: 导出 batchnorm 到 onnx Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 17:23:32 +08:00
YdrMaster	71a87c27d1	feat: 导出 ReduceMean 到 onnx Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 15:09:12 +08:00
YdrMaster	2a23669394	feat: 导出 Reshape 到 onnx Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 15:09:12 +08:00
Haojie Wang	0f52d04882	Merge branch 'master' into dev-onnx	2023-03-15 14:52:03 +08:00
deathwings602	40d1b1c91b	Add ConvTransposedNHWC (#67 ) * Add: IT_ASSERT_TODO * [WIP] Add: ConvTranspose2d mutation test * add ConvTransposedNHWC * fix test_cuda_transposed_2d --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>	2023-03-01 14:15:02 +08:00
YdrMaster	315763a83a	feat: 前端支持 pad 及单元测试 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-15 11:41:06 +08:00
YdrMaster	f9d0076a86	opt: 优化 SliceObj 构造器实现 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 16:44:08 +08:00
YdrMaster	62ceb78ae3	feat: 前端支持 reduceMean 及单元测试 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 15:35:01 +08:00
YdrMaster	fb9d84dbb7	opt: 优化 ReduceMeanObj 构造器实现 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 15:14:28 +08:00
YdrMaster	d11fb0ad5f	feat: 前端支持 gather 及单元测试 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 14:16:01 +08:00
YdrMaster	7626efbfa8	feat: 前端支持 reshape - 无法测试，因为后端不支持 shape 的 INT64 类型 opt: ReshapeObj 构造改为全部传值并在内部 move Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 09:51:11 +08:00
whjthu	26be533faa	Add documentation for operators.	2023-02-13 22:51:15 +08:00
zhengly123	c7ec9ee6e7	Add search engine (#64 ) * Add: tensor fuid * [Intermediate state] Add: Graph ctor for OpVec * Add: clone for operators * tmp: search_engine * search: init search Engine. * Add: dummy mutator for the test of search engine * search: add print graph. * search: add partition. * search: update comments. * Fix: remain FUID in Tensor::clone * Chore: rename GUidBaseType to UidBaseType * Fix: connect NMutator to SearchEngine * Chore: output * Fix test_memboundOp: nmutator uses input runtime * Chore: clang-format * Chore: clang-format * Fix: comments in the review --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: mazx <dyxdy@live.com>	2023-02-12 18:27:52 +08:00
wendy12022	d780f687fc	ADD: reconfig ResizeObj, support "tf_crop_and_resize " and cubic coeff kernel. (#59 ) add cubic coef add tf_crop_and_resize	2022-12-24 04:02:21 +08:00
wendy12022	c5966f8d81	Add: resize operator and cuda kernel,support nearest/linear coef. (#51 ) ADD: resize operator and cuda kernel,support nearest/linear coef. fix some fix tests add more tests for linear mode. add linear coef mode. add scales add tests fix tests. add notLarger notSmaller fix add test ADD:resize operator and cuda kernel	2022-11-14 09:30:22 +08:00
wendy12022	d1c913010f	ADD:reduce_mean operator and cuda kernel. (#47 ) add new line at file ending.	2022-10-15 16:53:58 +08:00
wendy12022	a4d6426589	ADD: batch norm operator and cuda kernel. (#44 ) fix numInputs of batchNorm, add new line in file ending. ADD: batch norm operator and cuda kernel. add training remove comments. fix compile error. add batch norm operator and cuda kernel.	2022-10-15 16:29:28 +08:00
wendy12022	26cee55e81	ADD:extend operator and cuda kernel. (#40 ) Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 14:52:50 +08:00
wendy12022	fe14c91f54	ADD: Gather operator and cuda kernel. (#41 ) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 14:44:20 +08:00
wendy12022	3c6e208f42	ADD:concat/split operator and cuda kernels (#29 ) * ADD:concat/split operator and cuda kernels refector minor change comment ADD:concat/split operator and cuda kernels merge split_kernel and concat_kernel to split_concat_kernel. Revert "fix" This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c. fix ADD:concat/split operator and cuda kernels change whole tensor name to composed tensor fix some remove unused header. rebase add CudaKernel add test for split. ADD split operator and cuda kernel. modify test. ADD:concat operator and cuda kernel. ADD:concat/split operator and cuda kernels fix some remove unused header. rebase add CudaKernel ADD:concat/split operator and cuda kernels add test for split. ADD split operator and cuda kernel. modify test. ADD:concat operator and cuda kernel. * remove extra comment; typo fix. Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2022-09-29 11:01:30 +08:00
wendy12022	5560d0f2fb	ADD:pad/slice operator and cuda kernel. (#39 ) fix compile error refector clang format split test. fix compile error. ADD slice cuda kernel. ADD slice operator. ADD:pad operator and cuda kernel.	2022-09-29 10:29:24 +08:00
wendy12022	9032cbb973	Add: reshape/flatten/identity OP and cuda kernel (#34 ) * ADD:reshape/flatten/identity operators and cuda kernel. fix: use cudaMemcpyAsync clang format. ADD flatten/identity operator. add test for reshape. ADD: reshape operator and cuda kernel. * Fix: seperate CUDA tests & remove old header Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-21 14:04:30 +08:00
zhengly123	8f67a5cc76	Add: ConvTransposed (#33 ) * Add: convTransposed2d operator * Fix: IT_ASSERT namespace * Add: nullptr check in as for Ref * Fix: conv transpose operator and kernel * Fix: makes PerfEngine singleton * Add: ConvTransposed test * Fix: rebase to master (PerfRecord shared_ptr) * Revert: Ref with nullptr check Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-19 15:05:39 +08:00
Hardy	6ac106cba4	Add activation operators and kernels * add code for activation operation * add code for activation operation on GPU * add test code for activation operation * add code for activation operation * add code for activation on gpu ,use cudnn * add code for activation on GPU use cudnn * Chore: add constants.h and remove comments Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-16 13:58:57 +08:00
zhengly123	172d03d6f2	Fix NNet tests after migration (#27 ) * Fix: interpreter ``` 4 - readlog (Failed) 8 - test_TConv2gemm (Failed) 11 - test_conv2conv (Failed) 12 - test_conv2gemm (Failed) 15 - test_g2bmm (Failed) 16 - test_guidedDLT (Subprocess aborted) 22 - test_mergeStage (Subprocess aborted) ``` * Exclude readlog from ctest * Fix: change the path of logs ``` 85% tests passed, 4 tests failed out of 27 Total Test time (real) = 100.69 sec The following tests FAILED: 10 - test_conv2conv (Timeout) 11 - test_conv2gemm (Timeout) 15 - test_guidedDLT (Subprocess aborted) 21 - test_mergeStage (Subprocess aborted) Errors while running CTest ``` - test_conv2conv 38529 ms total - test_conv2gemm 37098 ms total * Fix: test_mergeStage * Fix: test_guidedDLT ``` Start 1: test_graph 1/27 Test #1: test_graph ....................... Passed 0.05 sec Start 2: test_hash 2/27 Test #2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/27 Test #3: test_conv ........................ Passed 4.98 sec Start 4: test_Interpreter 4/27 Test #4: test_Interpreter ................. Passed 6.30 sec Start 5: test_OpSearch 5/27 Test #5: test_OpSearch .................... Passed 0.02 sec Start 6: test_Rule2VariableMerging 6/27 Test #6: test_Rule2VariableMerging ........ Passed 0.03 sec Start 7: test_TConv2gemm 7/27 Test #7: test_TConv2gemm .................. Passed 29.45 sec Start 8: test_as_tvm 8/27 Test #8: test_as_tvm ...................... Passed 0.02 sec Start 9: test_compareFormulas 9/27 Test #9: test_compareFormulas ............. Passed 0.02 sec Start 10: test_conv2conv 10/27 Test #10: test_conv2conv ................... Passed 36.55 sec Start 11: test_conv2gemm 11/27 Test #11: test_conv2gemm ................... Passed 39.70 sec Start 12: test_dlt 12/27 Test #12: test_dlt ......................... Passed 0.03 sec Start 13: test_exprHash 13/27 Test #13: test_exprHash .................... Passed 0.02 sec Start 14: test_g2bmm 14/27 Test #14: test_g2bmm ....................... Passed 0.16 sec Start 15: test_guidedDLT 15/27 Test #15: test_guidedDLT ................... Passed 0.07 sec Start 16: test_matchConv 16/27 Test #16: test_matchConv ................... Passed 0.02 sec Start 17: test_matchElementWise 17/27 Test #17: test_matchElementWise ............ Passed 0.03 sec Start 18: test_matchMatmul 18/27 Test #18: test_matchMatmul ................. Passed 0.02 sec Start 19: test_matchReshape 19/27 Test #19: test_matchReshape ................ Passed 0.02 sec Start 20: test_memboundOp 20/27 Test #20: test_memboundOp .................. Passed 0.02 sec Start 21: test_mergeStage 21/27 Test #21: test_mergeStage .................. Passed 0.02 sec Start 22: test_oobChecker 22/27 Test #22: test_oobChecker .................. Passed 0.02 sec Start 23: test_rangeMagnify 23/27 Test #23: test_rangeMagnify ................ Passed 0.02 sec Start 24: test_relaxation 24/27 Test #24: test_relaxation .................. Passed 0.02 sec Start 25: test_serializer 25/27 Test #25: test_serializer .................. Passed 0.03 sec Start 26: test_simplify 26/27 Test #26: test_simplify .................... Passed 0.02 sec Start 27: test_subset 27/27 Test #27: test_subset ...................... Passed 0.01 sec 100% tests passed, 0 tests failed out of 27 Total Test time (real) = 117.72 sec ``` * Fix: format * Replace nnet:Ref with infini::Ref ``` Start 1: test_graph 1/27 Test 1: test_graph ....................... Passed 0.02 sec Start 2: test_hash 2/27 Test 2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/27 Test 3: test_conv ........................ Passed 4.45 sec Start 4: test_Interpreter 4/27 Test 4: test_Interpreter ................. Passed 4.37 sec Start 5: test_OpSearch 5/27 Test 5: test_OpSearch .................... Passed 0.02 sec Start 6: test_Rule2VariableMerging 6/27 Test 6: test_Rule2VariableMerging ........ Passed 0.02 sec Start 7: test_TConv2gemm 7/27 Test 7: test_TConv2gemm .................. Passed 23.40 sec Start 8: test_as_tvm 8/27 Test 8: test_as_tvm ...................... Passed 0.02 sec Start 9: test_compareFormulas 9/27 Test 9: test_compareFormulas ............. Passed 0.01 sec Start 10: test_conv2conv 10/27 Test 10: test_conv2conv ................... Passed 32.28 sec Start 11: test_conv2gemm 11/27 Test 11: test_conv2gemm ................... Passed 29.41 sec Start 12: test_dlt 12/27 Test 12: test_dlt ......................... Passed 0.02 sec Start 13: test_exprHash 13/27 Test 13: test_exprHash .................... Passed 0.01 sec Start 14: test_g2bmm 14/27 Test 14: test_g2bmm ....................... Passed 0.14 sec Start 15: test_guidedDLT 15/27 Test 15: test_guidedDLT ................... Passed 0.06 sec Start 16: test_matchConv 16/27 Test 16: test_matchConv ................... Passed 0.02 sec Start 17: test_matchElementWise 17/27 Test 17: test_matchElementWise ............ Passed 0.02 sec Start 18: test_matchMatmul 18/27 Test 18: test_matchMatmul ................. Passed 0.02 sec Start 19: test_matchReshape 19/27 Test 19: test_matchReshape ................ Passed 0.01 sec Start 20: test_memboundOp 20/27 Test 20: test_memboundOp .................. Passed 0.02 sec Start 21: test_mergeStage 21/27 Test 21: test_mergeStage .................. Passed 0.01 sec Start 22: test_oobChecker 22/27 Test 22: test_oobChecker .................. Passed 0.01 sec Start 23: test_rangeMagnify 23/27 Test 23: test_rangeMagnify ................ Passed 0.01 sec Start 24: test_relaxation 24/27 Test 24: test_relaxation .................. Passed 0.01 sec Start 25: test_serializer 25/27 Test 25: test_serializer .................. Passed 0.02 sec Start 26: test_simplify 26/27 Test 26: test_simplify .................... Passed 0.01 sec Start 27: test_subset 27/27 Test 27: test_subset ...................... Passed 0.00 sec 100% tests passed, 0 tests failed out of 27 Total Test time (real) = 94.47 sec ``` * Relax time limit for CPU conv ``` Start 1: test_graph 1/29 Test 1: test_graph ....................... Passed 0.02 sec Start 2: test_hash 2/29 Test 2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/29 Test 3: test_conv ........................ Passed 4.47 sec Start 4: test_matmul 4/29 Test 4: test_matmul ...................... Passed 2.61 sec Start 5: test_pooling 5/29 Test 5: test_pooling ..................... Passed 2.57 sec Start 6: test_Interpreter 6/29 Test 6: test_Interpreter ................. Passed 4.35 sec Start 7: test_OpSearch 7/29 Test 7: test_OpSearch .................... Passed 0.02 sec Start 8: test_Rule2VariableMerging 8/29 Test 8: test_Rule2VariableMerging ........ Passed 0.02 sec Start 9: test_TConv2gemm 9/29 Test 9: test_TConv2gemm .................. Passed 23.32 sec Start 10: test_as_tvm 10/29 Test 10: test_as_tvm ...................... Passed 0.02 sec Start 11: test_compareFormulas 11/29 Test 11: test_compareFormulas ............. Passed 0.02 sec Start 12: test_conv2conv 12/29 Test 12: test_conv2conv ................... Passed 32.12 sec Start 13: test_conv2gemm 13/29 Test 13: test_conv2gemm ................... Passed 30.59 sec Start 14: test_dlt 14/29 Test 14: test_dlt ......................... Passed 0.02 sec Start 15: test_exprHash 15/29 Test 15: test_exprHash .................... Passed 0.01 sec Start 16: test_g2bmm 16/29 Test 16: test_g2bmm ....................... Passed 0.14 sec Start 17: test_guidedDLT 17/29 Test 17: test_guidedDLT ................... Passed 0.07 sec Start 18: test_matchConv 18/29 Test 18: test_matchConv ................... Passed 0.02 sec Start 19: test_matchElementWise 19/29 Test 19: test_matchElementWise ............ Passed 0.02 sec Start 20: test_matchMatmul 20/29 Test 20: test_matchMatmul ................. Passed 0.02 sec Start 21: test_matchReshape 21/29 Test 21: test_matchReshape ................ Passed 0.02 sec Start 22: test_memboundOp 22/29 Test 22: test_memboundOp .................. Passed 0.02 sec Start 23: test_mergeStage 23/29 Test 23: test_mergeStage .................. Passed 0.01 sec Start 24: test_oobChecker 24/29 Test 24: test_oobChecker .................. Passed 0.02 sec Start 25: test_rangeMagnify 25/29 Test 25: test_rangeMagnify ................ Passed 0.02 sec Start 26: test_relaxation 26/29 Test 26: test_relaxation .................. Passed 0.02 sec Start 27: test_serializer 27/29 Test 27: test_serializer .................. Passed 0.03 sec Start 28: test_simplify 28/29 Test 28: test_simplify .................... Passed 0.02 sec Start 29: test_subset 29/29 Test 29: test_subset ...................... Passed 0.00 sec 100% tests passed, 0 tests failed out of 29 Total Test time (real) = 100.65 sec ``` * Remove out-of-date tests Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-13 15:17:22 +08:00
wendy12022	13b7a2604b	ADD add/mul/sub/div/pow operators and CPU/CUDA kernels (#26 ) Fix some remove useless code. add div/pow kernel Add add/mul/sub operators. fix cpu kernel. add element wise kenerl for cuda. ADD element wise operator.	2022-09-09 13:43:59 +08:00
Anmuliar	0409eafb5f	Operators g2bmm&gbmm transplantation (#24 ) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-08 21:31:35 +08:00
wendy12022	c3bc278c12	Op matmul (#20 ) ADD:add cuda kernel for matmul. matmul tune Add test_matmul.cc	2022-09-01 21:06:55 +08:00
wendy12022	48293576c0	Add maxpool and avgpool operators (#17 ) * ADD:maxpool&&avgpool operators. add OperatorObj::getDType() clang format FIX:timeit API has changed. * Fix: Tensor::getInputs is const method * Chore Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-31 14:44:53 +08:00
zhengly123	93f86d3f4d	Simplify tensor transfer between CPU and CUDA (#10 ) * Add: OP infers data type & Graph clones tensor * Fix: vecToString format * Add: static assert for Tensor methods * Rename: getDataRawPtr -> getRawDataPtr Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-25 11:29:16 +08:00
zhengly123	9303ddda8e	Add Conv operator and naive CPU implemenation (#5 ) * Add: Conv definition * Add: tensor copy data from vector * Add: CPU conv kernel * Fix: replace Int32 with UInt32 in DataType Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-17 14:16:01 +08:00
zhengly123	a26890abce	Tensor hash and inferShape (#4 ) * Refactor: operator hash and inferShape * Add: hash without shape * Add: inferShape interface for given input tensors * Add: construct outputs in op ctor * Add: comments for matmul * Add: opType in AttrVector and WorkloadVector * Chore: _graph -> graph in Op ctor * Chore: change the "Node" suffix to "Obj" Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-15 15:08:56 +08:00
Liyan Zheng	2054b0eda4	Chore: rename getOpAttrs to getOpPerfKey	2022-08-09 15:34:28 +08:00
Liyan Zheng	8b685ae4a6	Update: OpAttrs -> OpPerfKey	2022-08-09 14:58:45 +08:00
Liyan Zheng	1205240218	Add: mutator abstract class	2022-08-08 15:54:17 +08:00

38 Commits