InfiniTensor

Commit Graph

Author	SHA1	Message	Date
Haojie Wang	7a9fcd93b2	Pooling ceil mode (#155 ) * add ceil mode for pooling * do not print debug info for allocator by default * fix test bugs after introducing pooling ceil mode * fix onnx import bug	2023-10-09 20:51:39 +08:00
PanZezhong1725	2412c25e67	Issue 107: Add copyin Numpy and covertion to Numpy (#126 ) * Add copyin_numpy and to_numpy for pybind TensorObj * fix copyin size assertion * fix size calculation for scalar (rank = 0) tensor * Use pybind buffer instead of returning array * fix format	2023-09-01 11:20:26 +08:00
zhangyunze	ef672894d0	support mixed dtype (#102 ) * feat: support mixed dtype * feat: support cast op * test: add test for cast op * feat: support datatype BFloat16 * feat: support data convert fp32 <-> bfp16 * fix: fix all op's infershape func * fix as review comment	2023-08-16 21:49:43 +08:00
kilinchange	0dc5347089	memory_allocator (#103 ) * - add LazyAllocator class - calculate memory consumption at present * - basic function of lazy_allocator, remaining test * - modify LazyAllocator * - modify InfiniTensor to fit LazyAllocator * - add setDataBlob - modify alignment - fix GraphObj::dataMalloc * - modified alignment value(64bytes -> 8bytes) - fix LazyAllocator::getPtr() - some dubug codes and commonts - do alignment by chaning size instead of tailAddr * - fix some problem * - translate chinese comments to english * - format codes * - fix test * - code format * - modify codes as YdrMaser and bitzyz suggested * - code format * - modify codes as constroy suggested * - codes format * - modify alignment on cuda * - code format * - add test_lazy_allocator - fix tests where not add input tensor into graph.tensors - fix tests where init tensor's data before calling graph->dataMallocate() * - code format * - remove gpu runtime in test_lazy_allocator * - fix test_lazy_allocator: remove cuda include * - add test * - code format * - add ifdef for test of allocator * - code format * - fix test: remove unused ifdef * - fix bang test * - code format * Merge branch 'master' into dcj/memory_allocator * fix: fix cuda conv_fp16 run fail * fix bang_runtime.cc and cuda_runtime.cc * - update mkl code * - fix codes for mkl * - code format * - remove unused commented codes - add an empty line at the end of the blob.cc --------- Co-authored-by: zhangyunze <z13785159769@163.com>	2023-08-13 13:39:35 +08:00
zhangyunze	9b10a74788	支持fp16 dtype (#96 ) * add conv_half kernel * Conv Kernel FP16 * dcj: replace "DataType::Float32" with "op->getDType()" to support more DataType * feat: support Float16 dtype * fix: set default clang-format to 14 version * fix: 按照review意见修改 * fix: add data convert to convfp16 kernel test * test: add conv_fp16 kernel test --------- Co-authored-by: zhangyue207 <zhangyue@qiyuanlab.com> Co-authored-by: kilinchange <kilinchange@163.com>	2023-08-02 16:38:16 +08:00
YdrMaster	26f0d13c26	Dev for 202303ddl (#66 ) * add activation operatiopn relu, tanh, sigmoid on mlu * commit for format * add activation backward operation * add test for activation_backward * add test * add convbpfilter * fix * add transpsoe code and test * add trigon function operation on mlu: sin,cos,tan,asin,sinh,asinh * add copy operation on mlu * add ceil operation and floor operation * add operation clip * add operation cnnl div, test and test for divdemo bangc kernel * add divnonan operation and test * add erf operation * add exp operation * add operation fill * add log operation * add log1p operation * add l2loss operation * add maximum and minimum operation * add mseloss operation * add negTensor operation * add power operation * add reciprocal operation * add sqrt and rsqrt operation * add transform operation * add addn operation * add muln operation * cherrry pick some operation * add floordiv operation and floordivtrunc operation * add floormod operation * add cumsum operation * add det operation * add pad operation * format * add concat operation * format * add split operation * fix concat and split operation * add round operation * add pooling operation * add square operation * add squaredDifference operation * code format fix * add flip operation * code format fix * add hardtanh operation * add logic operation * add addcdiv and addcmul operation * add arange operation * add bitcompute operation * add net test * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * style: rename Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 用 NativeCpuRuntime 替换 CpuRuntime Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix code * fix code * fix code by review suggestion * remove operation which is not the onnx operation * fix format * clang format * refactor: tensor 的 print 加一层模板的 dataToString Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: onnx 导出 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 增加计算图优化接口 Signed-off-by: YdrMaster <ydrml@hotmail.com> * add clip operation * feat: 支持导入 clip Signed-off-by: YdrMaster <ydrml@hotmail.com> * test: 导入导出测试加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix batch norm * feat: 增加 Shape 算子 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入 unsqueeze Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修正 clip 接口 feat: 支持导入 transpose Signed-off-by: YdrMaster <ydrml@hotmail.com> * add broadcast operation * fix elementwise-broadcast * fix elementwise broadcast * add broadcast for gpu elementsie * feat: pad 支持 axes 负数 feat: 不支持的 padding 导出为独立的 pad 算子 feat: 支持导入 onnxsim 过的 inception Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修正池化的测试 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导出 pads，支持 inception 导入导出，已加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持 densenet 导入导出，并加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导入 squeeze Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix softmax * feat: 导出 clip 和 transpose Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持 Conv 的 bias Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: bias of conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: bias of conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导入 split Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导出 split Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: conv group Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: matmul 的 bias 没有放在输入里，修正 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix exmaple * fix: 改正 reduce_mean 导出 Signed-off-by: YdrMaster <ydrml@hotmail.com> * refactor: 修改 slice 实现与 onnx 一致 Signed-off-by: YdrMaster <ydrml@hotmail.com> * style: 不导出两个 runtime 函数 Signed-off-by: YdrMaster <ydrml@hotmail.com> * doc: 中文使用指南 Signed-off-by: YdrMaster <ydrml@hotmail.com> * doc: 补全指南 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修复导入数据的问题 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 添加 Dropout 基本结构，但不支持两个输出是不同的类型 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 重新导出优化接口 feat: dropout 导入 Signed-off-by: YdrMaster <ydrml@hotmail.com> * build: BANG 选项加入 Makefile Signed-off-by: YdrMaster <ydrml@hotmail.com> * fxi code, change of test/kernels/bang/test* is use NativeCpuRuntime. chaneg of include/bang/bang_runtime is for the cntoolkit upgrade. * feat: 导出 bang runtime Signed-off-by: YdrMaster <ydrml@hotmail.com> * add USE_BANG=1 * fix matmul * fix reshape * fix * fix activation * fix transpose * format * format * update Makefile Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入导出 ConvTranspose Signed-off-by: YdrMaster <ydrml@hotmail.com> * add prelu on mlu * fix: ConvTranspose Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入导出 PRelu Signed-off-by: YdrMaster <ydrml@hotmail.com> * add convtrans on mlu * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * docs: 更新 README_CN.md Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix code by review suggestions * style Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: Softmax 的 axis 可以用默认值？感觉是 onnx 不标准 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix cuda & intelcpu bugs after merging --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: wanghailu <wanghailu0717@163.com> Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: whjthu <haojie0429@gmail.com>	2023-04-18 15:10:33 +08:00
zhengly123	a1974aabcd	NNET supports TVM backend and kernels (#78 ) * Add: mutator InfoGAN minimum test * Add: cache and padding (bugs!!) * Add: expression reader as a cmake target * Fix: [Intermediate] NMutator::expressionToGraph To be fix: matmul with implicit broadcast * Add: matmul broadcast * Fix: GraphObj ctor should use cloneTensor * Fix: cuBLAS failure when codegen is enabled * Add: Exception for checkCuError * Fix: graph OpList ctor * Add: expr simplication for TVM * Add: TVM headers and CMake include paths * Add: CMake config * Add: PackedFunc (broken) * Fix: remove cuCtxCreate which makes TVM fails * Fix: membound_tvm * Fix: test_memboundOp * Add: PRelu Expr and AsTVMVisitor * Add: Random generator * Add: support TVM packed function * Fix: specify runtime * Add: CMake support of TVM * Add: detailed output of Matmul * Add: comments for Matmul * Chore: format and comments * Chore: GraphObj::selfCheck without assert control * Fix: CMAKE_CXX_FLAGS in CMakeLists * fix merge bug * update api for mkl batchnorm test * fix lotus env * fig header bug --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn> Co-authored-by: whjthu <haojie0429@gmail.com>	2023-04-18 00:26:36 +08:00
wendy12022	43d4798323	ADD: sub graph replacement. (#56 ) reconfig: connections among op and tensor now is managered by GraphObj . add some comments merge from master merge from master ADD: sub graph replacement reconfig inputs of op resize, due to the check of operator inputs. ResizeObj::clone clang format fix some and add test for multi-output. replacement support multi-inputs and multi-outputs. add clone for all operators add replaceSubGraph addSubGraph remove extra code add more test remove extra print Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-04-17 13:09:07 +08:00
wendy12022	86ec4036ce	ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61 ) * move memory format transformation to TensorObj clang format add MemoryFormat for tensorObj. use post_ops for fused conv/deconv Distinguish mkl op_timer from cuda op timer. add act optype to conv and deconv add operator timer add mkl kernel for convTransposed minor fix for group conv do not use cblas_sgemm_batch CpuRuntimeObj->NativeCpuRuntimeObj add matmul op for mkl * fix: fix bugs when rebasing from master fix: fix bugs when rebasing from master * fix: update api after rebasing * fix: fix format; fix onnx import * fix: fix clang-format * [fix] fix conv_transpose test * [fix] use stronger test case for transposed conv * [fix] remove tensor memory format; fix mkl transpose conv * [fix] add FIXME tag for op_timer python api --------- Co-authored-by: whjthu <haojie0429@gmail.com>	2023-03-27 21:28:49 +08:00
whjthu	d9886e9de3	fix: remove inline keyword in class; rename getter and setter for inputOf and outputOf	2023-03-25 12:04:24 +08:00
YdrMaster	9db97eb212	refactor: 整合操作张量数据的方法 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-21 14:00:04 +08:00
YdrMaster	45a3cdfa30	feat: GraphObj 增加一个拓扑排序方法及其测试 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-03-15 15:09:12 +08:00
YdrMaster	a7e58bd8d0	feat: 补充 DataType 类型 - 增加了 6 个代数类型，与 onnx 的序号对应 - 现在可以导入 reshape 了 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-14 11:27:57 +08:00
YdrMaster	296fcc5aa0	feat: 创建 pyinfinitensor 前端 - python 前端项目结构及打包和安装脚本 - 后端编译出 so 改名为 backend，增加 GraphHandler 修改图结构 - ci 支持测试这些功能 Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-02-13 09:19:05 +08:00
zhengly123	c7ec9ee6e7	Add search engine (#64 ) * Add: tensor fuid * [Intermediate state] Add: Graph ctor for OpVec * Add: clone for operators * tmp: search_engine * search: init search Engine. * Add: dummy mutator for the test of search engine * search: add print graph. * search: add partition. * search: update comments. * Fix: remain FUID in Tensor::clone * Chore: rename GUidBaseType to UidBaseType * Fix: connect NMutator to SearchEngine * Chore: output * Fix test_memboundOp: nmutator uses input runtime * Chore: clang-format * Chore: clang-format * Fix: comments in the review --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: mazx <dyxdy@live.com>	2023-02-12 18:27:52 +08:00
Zixuan Ma	00b2f18c17	Fix: unsigned compare in test (#50 ) fix: unsigned compare in test. Test project /home/mazx/git/InfiniTensor/build Start 1: test_graph 1/18 Test #1: test_graph ....................... Passed 0.03 sec Start 2: test_hash 2/18 Test #2: test_hash ........................ Passed 0.01 sec Start 3: test_tensor_save 3/18 Test #3: test_tensor_save ................. Passed 0.02 sec Start 4: test_verify 4/18 Test #4: test_verify ...................... Passed 0.01 sec Start 5: test_batch_norm 5/18 Test #5: test_batch_norm .................. Passed 0.01 sec Start 6: test_concat 6/18 Test #6: test_concat ...................... Passed 0.01 sec Start 7: test_conv 7/18 Test #7: test_conv ........................ Passed 0.24 sec Start 8: test_conv_transposed_2d 8/18 Test #8: test_conv_transposed_2d .......... Passed 0.01 sec Start 9: test_element_wise 9/18 Test #9: test_element_wise ................ Passed 0.01 sec Start 10: test_extend 10/18 Test #10: test_extend ...................... Passed 0.01 sec Start 11: test_gather 11/18 Test #11: test_gather ...................... Passed 0.01 sec Start 12: test_matmul 12/18 Test #12: test_matmul ...................... Passed 0.01 sec Start 13: test_pad 13/18 Test #13: test_pad ......................... Passed 0.01 sec Start 14: test_pooling 14/18 Test #14: test_pooling ..................... Passed 0.01 sec Start 15: test_reduce_mean 15/18 Test #15: test_reduce_mean ................. Passed 0.01 sec Start 16: test_reshape 16/18 Test #16: test_reshape ..................... Passed 0.01 sec Start 17: test_slice 17/18 Test #17: test_slice ....................... Passed 0.01 sec Start 18: test_split 18/18 Test #18: test_split ....................... Passed 0.02 sec 100% tests passed, 0 tests failed out of 18	2022-10-19 15:03:03 +08:00
zhengly123	4e0040c8a0	Add: connection among tensors and operators (#45 ) * Add: refs_to_wrefs and wrefs_to_refs * Add: op and tensor connection * Add: inception-v3 block test * Refactor: addOperatorAndConnect Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-10-18 22:02:51 +08:00
Hardy	03de74f4bc	Tensor serialization (#25 ) * use protobuf for tensor data save,write,read, in chinese 序列化和反序列化 * add protobuf * add code for tensor load & save from/to file * add code for tensor laod & save * add code for tensor load & save * add code for tensor save & load * add code for tensor save & load * add code for save & load * add code for load & save * add code for tensor load & save * add code for tensor save & load Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2022-09-13 11:27:41 +08:00
Hardy	e1d43202d7	Verify wanghailu 0902 (#22 ) * commit for verify, add some difference function * add code for verify * add code for verify Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2022-09-05 15:45:52 +08:00
Hardy	32a01efbbe	add code for backtrace (#21 ) * add code for backtrace * Add: infini::Exception ``` Test project /home/zly/InfiniTensor_aux/build Start 1: test_graph 1/4 Test #1: test_graph ....................... Passed 0.05 sec Start 2: test_hash 2/4 Test #2: test_hash ........................ Passed 0.02 sec Start 3: test_conv 3/4 Test #3: test_conv ........................ Passed 4.40 sec Start 4: test_pooling 4/4 Test #4: test_pooling ..................... Passed 2.47 sec 100% tests passed, 0 tests failed out of 4 Total Test time (real) = 6.94 sec ``` * Fix: USE_BACKTRACE in cmake Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-09-01 20:30:12 +08:00
wendy12022	48293576c0	Add maxpool and avgpool operators (#17 ) * ADD:maxpool&&avgpool operators. add OperatorObj::getDType() clang format FIX:timeit API has changed. * Fix: Tensor::getInputs is const method * Chore Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-31 14:44:53 +08:00
zhengly123	93f86d3f4d	Simplify tensor transfer between CPU and CUDA (#10 ) * Add: OP infers data type & Graph clones tensor * Fix: vecToString format * Add: static assert for Tensor methods * Rename: getDataRawPtr -> getRawDataPtr Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-25 11:29:16 +08:00
zhengly123	af08df32d2	Extended DataType class and Runtime interaction (#9 ) * Add: DataType class * Add: data-type-oblivious tensor interface * Rename: copyBlobToCPU Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-23 16:55:59 +08:00
zhengly123	04ea5eed38	Add CUDA runtime (#6 ) * Fix: add warm-up and repetition in timing * Add: CUDA runtime and float support * Refactor: Cuda and Cpu runtimes inherit Runtime * Add: environment script for Lotus * Add: Lotus build instructions * Update README.md Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-22 15:01:03 +08:00
zhengly123	9303ddda8e	Add Conv operator and naive CPU implemenation (#5 ) * Add: Conv definition * Add: tensor copy data from vector * Add: CPU conv kernel * Fix: replace Int32 with UInt32 in DataType Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-17 14:16:01 +08:00
zhengly123	a26890abce	Tensor hash and inferShape (#4 ) * Refactor: operator hash and inferShape * Add: hash without shape * Add: inferShape interface for given input tensors * Add: construct outputs in op ctor * Add: comments for matmul * Add: opType in AttrVector and WorkloadVector * Chore: _graph -> graph in Op ctor * Chore: change the "Node" suffix to "Obj" Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>	2022-08-15 15:08:56 +08:00
Liyan Zheng	1205240218	Add: mutator abstract class	2022-08-08 15:54:17 +08:00
Liyan Zheng	efa966a3e2	Add: perf engine	2022-08-07 21:12:17 +08:00
Liyan Zheng	6c356d5b42	Add: kernel registry and naive Matmul kernel	2022-08-06 15:58:40 +08:00
Liyan Zheng	559be5866d	Add: Matmul operator	2022-08-05 12:50:34 +08:00
Liyan Zheng	e6101b0336	Add: graph, tensor, and operator	2022-07-31 21:44:03 +08:00

31 Commits