InfiniTensor

Commit Graph

Author	SHA1	Message	Date
Haojie Wang	8e4d88fb9f	add transpose, concat and split for native cpu (#158 )	2023-10-12 10:14:28 +08:00
PanZezhong1725	36ae7b7fb6	Add GatherElements op and cuda kernel (#149 ) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-10-12 09:18:12 +08:00
PanZezhong1725	ed3034f878	Add HardSigmoid and HardSwish (#156 ) * Add HardSigmoid and HardSwish * fix format	2023-10-10 22:41:06 +08:00
kilinchange	1151101fb9	add naive allocator for debugging (#140 ) * add naive allocator only for debugging * merge redundant api --------- Co-authored-by: whjthu <haojie0429@gmail.com>	2023-10-10 16:42:23 +08:00
Haojie Wang	90b9a80f72	add onnx simplify (#153 ) * add onnx simplify * fix test bug * update ci policy * fix onnx simpilfy bug * update ci workflow	2023-10-10 15:45:27 +08:00
ChengXiang Qi	7f16fa353e	【Hackathon No.108】Add Gelu operator, ffi, kernel for cpu and gpu. (#148 ) feat: Add Gelu kernel, operator, ffi.	2023-10-10 15:21:13 +08:00
PanZezhong1725	7600fe688c	Add Neg operator and kernel (#152 ) * Add Neg operator and kernel * handle neg in to_onnx --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-10-10 10:54:56 +08:00
Haojie Wang	7a9fcd93b2	Pooling ceil mode (#155 ) * add ceil mode for pooling * do not print debug info for allocator by default * fix test bugs after introducing pooling ceil mode * fix onnx import bug	2023-10-09 20:51:39 +08:00
PanZezhong1725	785853b0a3	Add erf kernel for cpu and gpu (#147 ) Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan>	2023-10-09 09:36:55 +08:00
Haojie Wang	c0ff584e04	add constant op; fix concat bug (#151 )	2023-10-08 21:42:41 +08:00
Haojie Wang	f25bcca076	add python examples (#143 ) * add python examples * use copy_numpy instead of copy_float	2023-09-28 10:40:45 +08:00
kilinchange	877db21021	Fix support kvcache (#142 ) * - fix onnx.py * - fix shard_concat	2023-09-27 11:08:44 +08:00
PanZezhong1725	62be816f53	修复split concat当dim=0结果出错的问题 (#138 ) Fix split_concat kernel not supporting dim=0 Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-09-25 10:25:54 +08:00
Haojie Wang	8f2597a508	fix bang runtime bug after merging distributed branch (#137 )	2023-09-19 14:10:39 +08:00
kilinchange	48ec730579	Support kvcache (#134 ) * add cmake bits about NCCL * move example to examples/NNmodel * impl NCCL communicator * add comm related function to Runtime * export runtime interface * add launch.py * use unique name to distingush the the NCCL ID file * add timeout to communicator init * expose communicator obj from runtime obj, add unit test for nccl communicator * reformat files * Add allReduce operator and cuda nccl allReduce kernel * impl model parallel for resnet * add allGather nccl kernel and operator * Add allreduce allgather operator tests, change allgather kernel to output list of tensor, fix shape infer, handle nullptr output * fix format of onnx.py * use concat following AllGather * get tensor parallel for resnet * fix format of graph_handler.cc * change BUILD_DIST default to OFF * polish code of communicator * update .gitignore * export min/max to python * fix MatMul * modify launch.py to run opt * hack to treat ReduceSum as AllReduceSum * throw exception in cuda error * fix parallel_opt.py * improve the error prompt and cuda error check * fix GatherObj::GatherObj member init * fix size calculation for scalar (rank = 0) tensor * MatMul supports bias * fix add bias for row parallel gemm * add --gen_std to launch.py * fix AllReduceNCCL * update launch.py * less log * update parallel_opt * update launch.py * add __eq__ for Placement sub-classes * less benchmark run * fix placement infer for matmul * fix vacabuary size * fix Exception * Add shard tensor with group to support gpt2 * Add find successor function to find split op at different depth * recover CommunicatorObj * improve error mesasge * optimize parallel_opt.py * optimize launch.py * recover docs for all_reduce and all_gather * - support concat for kvcache * - modify allocator * - add tensorType - modify allocator to support memory allocation based on tensorType * - fix allocator init * - support kvcache by running 2 stub distributively * - fix name * - remove unused flag * - fix wrong pb name * - fix as constroy suggessed * - fix launch.py format --------- Co-authored-by: constroy <constroy.li@gmail.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>	2023-09-18 14:17:02 +08:00
PanZezhong1725	c6b82cfda0	Copyout numpy接口 (#135 ) * Add copy out numpy interface, delete returning buffer directly, add api test * Add dtype interface	2023-09-15 16:40:44 +08:00
constroy Li	4c321c8a91	tensor parallel for transformer (#125 ) * add cmake bits about NCCL * move example to examples/NNmodel * impl NCCL communicator * add comm related function to Runtime * export runtime interface * add launch.py * use unique name to distingush the the NCCL ID file * add timeout to communicator init * expose communicator obj from runtime obj, add unit test for nccl communicator * reformat files * Add allReduce operator and cuda nccl allReduce kernel * impl model parallel for resnet * add allGather nccl kernel and operator * Add allreduce allgather operator tests, change allgather kernel to output list of tensor, fix shape infer, handle nullptr output * fix format of onnx.py * use concat following AllGather * get tensor parallel for resnet * fix format of graph_handler.cc * change BUILD_DIST default to OFF * polish code of communicator * update .gitignore * export min/max to python * fix MatMul * modify launch.py to run opt * hack to treat ReduceSum as AllReduceSum * throw exception in cuda error * fix parallel_opt.py * improve the error prompt and cuda error check * fix GatherObj::GatherObj member init * fix size calculation for scalar (rank = 0) tensor * MatMul supports bias * fix add bias for row parallel gemm * add --gen_std to launch.py * fix AllReduceNCCL * update launch.py * less log * update parallel_opt * update launch.py * add __eq__ for Placement sub-classes * less benchmark run * fix placement infer for matmul * fix vacabuary size * fix Exception * Add shard tensor with group to support gpt2 * Add find successor function to find split op at different depth * recover CommunicatorObj * improve error mesasge * optimize parallel_opt.py * optimize launch.py * recover docs for all_reduce and all_gather * Fix API * fix format --------- Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-09-14 14:19:45 +08:00
xgqdut2016	dda668fd16	"modified where" (#131 ) * "modified where" * "adapt int or bool condition datatype" * "add broadcast_shape.h,error" * add broadcast.h * "modified broadcast_shape.h and where.cc,.cu"	2023-09-14 10:45:57 +08:00
constroy Li	f60767a770	impl distributed launch with NCCL (#106 ) * add cmake bits about NCCL * move example to examples/NNmodel * impl NCCL communicator * add comm related function to Runtime * export runtime interface * add launch.py * use unique name to distingush the the NCCL ID file * add timeout to communicator init * expose communicator obj from runtime obj, add unit test for nccl communicator * reformat files * Add allReduce operator and cuda nccl allReduce kernel * impl model parallel for resnet * add allGather nccl kernel and operator * Add allreduce allgather operator tests, change allgather kernel to output list of tensor, fix shape infer, handle nullptr output * fix format of onnx.py * use concat following AllGather * get tensor parallel for resnet * fix format of graph_handler.cc * change BUILD_DIST default to OFF * polish code of communicator * update .gitignore * Add broadcast operator and cuda kernel * Add comments for operators * remove const of class member * move communicator to CudaRuntimeObj * Add an empty line at EOF. --------- Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-09-05 09:47:35 +08:00
Hardy	b4eda85e67	Fix mlu (#87 ) * fix some operator code * fix some code of mlu operator * fix some code of cast and elementwise * clang format * remove copy kernel * fix cast * fix clang-format --------- Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-09-04 08:33:28 +08:00
PanZezhong1725	2412c25e67	Issue 107: Add copyin Numpy and covertion to Numpy (#126 ) * Add copyin_numpy and to_numpy for pybind TensorObj * fix copyin size assertion * fix size calculation for scalar (rank = 0) tensor * Use pybind buffer instead of returning array * fix format	2023-09-01 11:20:26 +08:00
zhangyunze	3e6ef305f1	框架支持bert/gpt2模型构图 (#94 ) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-08-29 16:06:52 +08:00
ChengXiang Qi	d8ffd8a4b7	feat(env): add docker support. (#122 ) This PR adds Docker support for running this project, and it primarily accomplishes the following tasks: - Added the necessary `Dockerfile` for running the project on CPU environment. - Added commands to the `Makefile` for convenient Docker startup. - Added documentation in `docs/INSTALL_GUIDE_CN.md` explaining how to launch the Docker environment.	2023-08-28 18:34:36 +08:00
kuangjux	a8a5c037ca	feat(env): add docker support. - Added the necessary `Dockerfile` for running the project on CPU and CUDA environment. - Added commands to the `Makefile` for convenient Docker startup. - Added documentation in `docs/INSTALL_GUIDE_CN.md` explaining how to launch the Docker environment. Co-authored-by: Xiaonan Song <songxiaonan@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-08-28 16:28:09 +08:00
PanZezhong1725	69fd251e5d	Fix kernel arguments, add debug mode (#119 ) Add debug mode macro in cmakelist.	2023-08-28 08:58:38 +08:00
panzezhong	0ce7e7651f	Fix kernel arguments, add debug mode	2023-08-24 13:39:22 +08:00
constroy Li	1e91979c76	add CUDNN impl for Min and Max (#118 ) * add cudnn impl for Min and Max * fix onnx _search_shape with output shape	2023-08-22 16:19:29 +08:00
zhangyunze	1438f14a25	fix: fix castType mlu (#117 )	2023-08-22 14:54:32 +08:00
PanZezhong1725	9cf6c30e1c	Add cuda transpose kernel (#115 ) * Add cuda transpose kernel * Empty line cuda_transpose.h * Empty line small_array.h * empty line transpose.cc * empty line transpose.cu * empty line test_cuda_transpose.cc	2023-08-22 14:22:15 +08:00
constroy Li	384407421b	cudnn activations support ND-Tensor (#116 ) * refine TensorObj::getStride * ActivationCudnn supports ND-Tensor	2023-08-22 14:21:59 +08:00
constroy Li	48847958d0	impl sqrt on CUDA (#109 ) * impl sqrt on CUDA fix parser of Gather and ReduceMean * fix test_gather * fix test_cuda_gather * impl sqrt cpu and add sqrt to test_cuda_unary * cuda_unary supports arbitary shapes * fix SplitOp with dim=-1 * fix SplitOp with dim=-1	2023-08-18 12:17:47 +08:00
zhangyunze	ef672894d0	support mixed dtype (#102 ) * feat: support mixed dtype * feat: support cast op * test: add test for cast op * feat: support datatype BFloat16 * feat: support data convert fp32 <-> bfp16 * fix: fix all op's infershape func * fix as review comment	2023-08-16 21:49:43 +08:00
kilinchange	0dc5347089	memory_allocator (#103 ) * - add LazyAllocator class - calculate memory consumption at present * - basic function of lazy_allocator, remaining test * - modify LazyAllocator * - modify InfiniTensor to fit LazyAllocator * - add setDataBlob - modify alignment - fix GraphObj::dataMalloc * - modified alignment value(64bytes -> 8bytes) - fix LazyAllocator::getPtr() - some dubug codes and commonts - do alignment by chaning size instead of tailAddr * - fix some problem * - translate chinese comments to english * - format codes * - fix test * - code format * - modify codes as YdrMaser and bitzyz suggested * - code format * - modify codes as constroy suggested * - codes format * - modify alignment on cuda * - code format * - add test_lazy_allocator - fix tests where not add input tensor into graph.tensors - fix tests where init tensor's data before calling graph->dataMallocate() * - code format * - remove gpu runtime in test_lazy_allocator * - fix test_lazy_allocator: remove cuda include * - add test * - code format * - add ifdef for test of allocator * - code format * - fix test: remove unused ifdef * - fix bang test * - code format * Merge branch 'master' into dcj/memory_allocator * fix: fix cuda conv_fp16 run fail * fix bang_runtime.cc and cuda_runtime.cc * - update mkl code * - fix codes for mkl * - code format * - remove unused commented codes - add an empty line at the end of the blob.cc --------- Co-authored-by: zhangyunze <z13785159769@163.com>	2023-08-13 13:39:35 +08:00
zhangyunze	bd9e1aeb3f	fix: fix cuda conv_fp16 run fail (#105 )	2023-08-10 15:22:18 +08:00
Derui Yang	57ac94d893	refactor(core): 添加新的 `OpType` 定义 (#99 ) * feat: 添加新的 OpType 定义 Signed-off-by: YdrMaster <ydrml@hotmail.com> * refactor: 使用新的 OpType 替换原来的，修改整个项目 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: onnx 导入 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修正 cuda 和 bang kernel 的问题 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 过滤 bang test Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 过滤 bang test Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix bang code. * fix code on bang * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 删除指定文件 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 删两个没用的文件，去掉一个不知道为什么的注释 Signed-off-by: YdrMaster <ydrml@hotmail.com> --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2023-08-07 11:17:05 +08:00
zhangyunze	9b10a74788	支持fp16 dtype (#96 ) * add conv_half kernel * Conv Kernel FP16 * dcj: replace "DataType::Float32" with "op->getDType()" to support more DataType * feat: support Float16 dtype * fix: set default clang-format to 14 version * fix: 按照review意见修改 * fix: add data convert to convfp16 kernel test * test: add conv_fp16 kernel test --------- Co-authored-by: zhangyue207 <zhangyue@qiyuanlab.com> Co-authored-by: kilinchange <kilinchange@163.com>	2023-08-02 16:38:16 +08:00
Derui Yang	1dc65e2788	build: 实现格式化 git added c/c++ 源码的脚本 (#98 ) * build: 实现格式化 git added c/c++ 源码的脚本 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 扩充 c 风格文件类型 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: format py files Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持从任意 commit 开始格式化所有修改的文件 Signed-off-by: YdrMaster <ydrml@hotmail.com> --------- Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-07-21 12:29:50 +08:00
YdrMaster	f7de8113e0	fix: 修正 README.md (#93 ) Signed-off-by: YdrMaster <ydrml@hotmail.com>	2023-07-11 10:03:38 +08:00
YdrMaster	7023454e32	Update docs (#92 ) * docs: 规范化文档 Signed-off-by: YdrMaster <ydrml@hotmail.com> * Update README.md --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: zhengly123 <zhengly123@outlook.com>	2023-07-10 02:31:45 +08:00
Hardy	ab74b6a321	Update doc 0627 (#89 ) * update doc of mlu * delete README_CN.md. because the file has been split into INSTALL_GUIDE_CN.md and USER_GUIDE_CN.md at 2023.06.23 * remove the build Dependencies of test-cpp, avoid twice build * fix code --------- Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>	2023-07-06 16:57:10 +08:00
constroy	579cdbbb81	fix ReduceMean and element_wise (#90 ) * feat: 导出 getPerfTime 到 python Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix parsing of ReduceMean * ReduceMean axes defaults to None * fix ElementWiseCudnn with shape broadcasting * fix format --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: YdrMaster <ydrml@hotmail.com>	2023-06-29 07:15:07 +08:00
Hardy	19d7dc871d	update doc (#83 ) * update doc * update doc * update doc * update doc * add code * add code * update doc * update doc * add env.sh and update install guide * fix * fix bug * fix * add code * code format * Update exception.cc --------- Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: wanghailu <wanghailu0717@163.com>	2023-06-23 14:22:52 +08:00
YdrMaster	26f0d13c26	Dev for 202303ddl (#66 ) * add activation operatiopn relu, tanh, sigmoid on mlu * commit for format * add activation backward operation * add test for activation_backward * add test * add convbpfilter * fix * add transpsoe code and test * add trigon function operation on mlu: sin,cos,tan,asin,sinh,asinh * add copy operation on mlu * add ceil operation and floor operation * add operation clip * add operation cnnl div, test and test for divdemo bangc kernel * add divnonan operation and test * add erf operation * add exp operation * add operation fill * add log operation * add log1p operation * add l2loss operation * add maximum and minimum operation * add mseloss operation * add negTensor operation * add power operation * add reciprocal operation * add sqrt and rsqrt operation * add transform operation * add addn operation * add muln operation * cherrry pick some operation * add floordiv operation and floordivtrunc operation * add floormod operation * add cumsum operation * add det operation * add pad operation * format * add concat operation * format * add split operation * fix concat and split operation * add round operation * add pooling operation * add square operation * add squaredDifference operation * code format fix * add flip operation * code format fix * add hardtanh operation * add logic operation * add addcdiv and addcmul operation * add arange operation * add bitcompute operation * add net test * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * style: rename Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 用 NativeCpuRuntime 替换 CpuRuntime Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix code * fix code * fix code by review suggestion * remove operation which is not the onnx operation * fix format * clang format * refactor: tensor 的 print 加一层模板的 dataToString Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: onnx 导出 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 增加计算图优化接口 Signed-off-by: YdrMaster <ydrml@hotmail.com> * add clip operation * feat: 支持导入 clip Signed-off-by: YdrMaster <ydrml@hotmail.com> * test: 导入导出测试加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix batch norm * feat: 增加 Shape 算子 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入 unsqueeze Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修正 clip 接口 feat: 支持导入 transpose Signed-off-by: YdrMaster <ydrml@hotmail.com> * add broadcast operation * fix elementwise-broadcast * fix elementwise broadcast * add broadcast for gpu elementsie * feat: pad 支持 axes 负数 feat: 不支持的 padding 导出为独立的 pad 算子 feat: 支持导入 onnxsim 过的 inception Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修正池化的测试 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导出 pads，支持 inception 导入导出，已加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持 densenet 导入导出，并加入 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导入 squeeze Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix softmax * feat: 导出 clip 和 transpose Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持 Conv 的 bias Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: bias of conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: bias of conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导入 split Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 导出 split Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: conv Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: conv group Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: matmul 的 bias 没有放在输入里，修正 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix exmaple * fix: 改正 reduce_mean 导出 Signed-off-by: YdrMaster <ydrml@hotmail.com> * refactor: 修改 slice 实现与 onnx 一致 Signed-off-by: YdrMaster <ydrml@hotmail.com> * style: 不导出两个 runtime 函数 Signed-off-by: YdrMaster <ydrml@hotmail.com> * doc: 中文使用指南 Signed-off-by: YdrMaster <ydrml@hotmail.com> * doc: 补全指南 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 修复导入数据的问题 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 添加 Dropout 基本结构，但不支持两个输出是不同的类型 Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 重新导出优化接口 feat: dropout 导入 Signed-off-by: YdrMaster <ydrml@hotmail.com> * build: BANG 选项加入 Makefile Signed-off-by: YdrMaster <ydrml@hotmail.com> * fxi code, change of test/kernels/bang/test* is use NativeCpuRuntime. chaneg of include/bang/bang_runtime is for the cntoolkit upgrade. * feat: 导出 bang runtime Signed-off-by: YdrMaster <ydrml@hotmail.com> * add USE_BANG=1 * fix matmul * fix reshape * fix * fix activation * fix transpose * format * format * update Makefile Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入导出 ConvTranspose Signed-off-by: YdrMaster <ydrml@hotmail.com> * add prelu on mlu * fix: ConvTranspose Signed-off-by: YdrMaster <ydrml@hotmail.com> * feat: 支持导入导出 PRelu Signed-off-by: YdrMaster <ydrml@hotmail.com> * add convtrans on mlu * fmt Signed-off-by: YdrMaster <ydrml@hotmail.com> * docs: 更新 README_CN.md Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix code by review suggestions * style Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: Softmax 的 axis 可以用默认值？感觉是 onnx 不标准 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix cuda & intelcpu bugs after merging --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: wanghailu <wanghailu0717@163.com> Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: whjthu <haojie0429@gmail.com>	2023-04-18 15:10:33 +08:00
zhengly123	a1974aabcd	NNET supports TVM backend and kernels (#78 ) * Add: mutator InfoGAN minimum test * Add: cache and padding (bugs!!) * Add: expression reader as a cmake target * Fix: [Intermediate] NMutator::expressionToGraph To be fix: matmul with implicit broadcast * Add: matmul broadcast * Fix: GraphObj ctor should use cloneTensor * Fix: cuBLAS failure when codegen is enabled * Add: Exception for checkCuError * Fix: graph OpList ctor * Add: expr simplication for TVM * Add: TVM headers and CMake include paths * Add: CMake config * Add: PackedFunc (broken) * Fix: remove cuCtxCreate which makes TVM fails * Fix: membound_tvm * Fix: test_memboundOp * Add: PRelu Expr and AsTVMVisitor * Add: Random generator * Add: support TVM packed function * Fix: specify runtime * Add: CMake support of TVM * Add: detailed output of Matmul * Add: comments for Matmul * Chore: format and comments * Chore: GraphObj::selfCheck without assert control * Fix: CMAKE_CXX_FLAGS in CMakeLists * fix merge bug * update api for mkl batchnorm test * fix lotus env * fig header bug --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn> Co-authored-by: whjthu <haojie0429@gmail.com>	2023-04-18 00:26:36 +08:00
wendy12022	43d4798323	ADD: sub graph replacement. (#56 ) reconfig: connections among op and tensor now is managered by GraphObj . add some comments merge from master merge from master ADD: sub graph replacement reconfig inputs of op resize, due to the check of operator inputs. ResizeObj::clone clang format fix some and add test for multi-output. replacement support multi-inputs and multi-outputs. add clone for all operators add replaceSubGraph addSubGraph remove extra code add more test remove extra print Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-04-17 13:09:07 +08:00
wendy12022	c8b2c8ed32	Cpu backend2 (#77 ) fix review change Device::MKL to Device::INTELCPU fix mkl linkage fix errors according to merge from master now can call mkl backend fix softmax/flatten with axis from onnx. modify README.md fix memory refree add env_lotus_intelcpu.sh fix compile merge from branch cpu_backend fix something add gather fix something FIX: directory rename from "mkl" to "intelcpu" ADD: use oneMKL dpcpp interface to implement matmul kernel. ADD: add dpcpp as compiler for mkl, and fix warnings for clang compiling. add dpcpp kernel for pow. ADD: mkl kernel for pad. ADD: slice mkl kernel. ADD: reshape/flatten/identity mkl kernel. ADD: split mkl kernel. fix compile error FIX: fix flattenObj with axis. ADD reduce_mean mkl kernel. Add concat mkl kernel. bathNorm for mkl kernel. sigmoid mkl kernel. ADD：add mkl kernel for pooling add more tests for softmax Now softmax cuda kernel supports any axises. mkl kernel for softmax softmax add axis to softmax operator add mkl kernel for abs tanh ADD: relu kernel for mkl fix binary mkl primitives. add mkl kernel for binary operators fix compiler error move stream to runtime clang format add MemoryFormat for tensorObj. use post_ops for fused conv/deconv Distinguish mkl op_timer from cuda op timer. add act optype to conv and deconv add operator timer add mkl kernel for convTransposed minor fix for group conv do not use cblas_sgemm_batch CpuRuntimeObj->NativeCpuRuntimeObj add matmul op for mkl	2023-04-17 12:15:23 +08:00
Hardy	fe1afe38fa	fix code of bang conv (#76 ) * fix code of bang conv * test: 向 master push 时也执行 ci Signed-off-by: YdrMaster <ydrml@hotmail.com> --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: YdrMaster <ydrml@hotmail.com>	2023-03-29 15:47:32 +08:00
Hardy	823e66a9ff	Support perf bang 1115 (#57 ) * support matmul * add matmul * add matmul * add code for cnnl matmul operation and test * add conv * add code for conv test on mlu * add code for test cnnl conv on mlu * add code for perf conv and matmul on mlu * clang format * fix convolution operation * fxi cmaklist * code format * fix code * code format --------- Co-authored-by: wanghailu <wanghailu@qiyuanlab.com> Co-authored-by: wanghailu <wanghailu0717@163.com>	2023-03-29 13:52:56 +08:00
wendy12022	86ec4036ce	ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61 ) * move memory format transformation to TensorObj clang format add MemoryFormat for tensorObj. use post_ops for fused conv/deconv Distinguish mkl op_timer from cuda op timer. add act optype to conv and deconv add operator timer add mkl kernel for convTransposed minor fix for group conv do not use cblas_sgemm_batch CpuRuntimeObj->NativeCpuRuntimeObj add matmul op for mkl * fix: fix bugs when rebasing from master fix: fix bugs when rebasing from master * fix: update api after rebasing * fix: fix format; fix onnx import * fix: fix clang-format * [fix] fix conv_transpose test * [fix] use stronger test case for transposed conv * [fix] remove tensor memory format; fix mkl transpose conv * [fix] add FIXME tag for op_timer python api --------- Co-authored-by: whjthu <haojie0429@gmail.com>	2023-03-27 21:28:49 +08:00
Haojie Wang	65a3abf5dc	feat: inference (#71 ) 导出推理接口，支持通过 python 调用框架推理	2023-03-25 12:09:22 +08:00

1 2 3 4

185 Commits All Branches Search

185 Commits

All Branches