InfiniTensor

Commit Graph

Author	SHA1	Message	Date
PanZezhong1725	7f6aec6c17	针对bert和gpt2模型分布式推理的优化 (#221 ) * fix(dist): 改善分布式脚本，只打印绝对误差 * feat(dist): 增加可导出onnx的pytorch运行脚本 * feat(front): 增加对Y值为-inf的where算子的图优化 * feat(kernel): 对b为常数的pow和div算子进行特判优化 * fix(front): 消除前端对global output形状信息的依赖，分布式脚本删除不必要的shape infer * feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化 * fix(kernel): 删除div pow const中不必要的同步 * Update expand.cu * fix: fix comments --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com> Co-authored-by: Derui Yang <ydrml@hotmail.com>	2024-04-01 14:04:28 +08:00
Chenjie Duan	51086d2b8d	Modify kernel registration & support fp16 (#205 ) * - Remove dataType from the kernel registration. * - support fp16 for conv * - cpu kernel: adapt the new registration mechanism * modified all register kernel * add where fp16 * add layernorm fp16 * add split_concat fp16 * - element_wise support fp16 * feat: support transpose fp16 * feat: support sliceOp fp16 * - unary support fp16 * - feat: support reduceOp fp16 * feat: support matmulOp/expandOp fp16 * feat: support powOp int8 * add cuda cast & support half-precision for gather * style: fix style * feat:support int8 for gather * style:fix style * modified test_cuda_conv_transposed * fix: fix dist code to support fp16 * fix(graph.cc): fix topo_sort * fix: fix recv and send kernel registration * feat: add field tensors for stub * refactor(frontend): 先排序后构图 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 为中间结果提供tensor到node的mapping * fix (slice): add guard for area out of range * fix: fix matmul fp16 * fix: fix re-dataMalloc for weight tensor and use of naive allocator * feat: add dataType filter for cuda kernel * feat: bang kernel adapt the new registration mechanism * fix: fix some error on mlu * feat: intelcpu kernel adapt the new registration mechanism * feat: modify kernel registration on kunlun * fix intelcpu compiler bug * feat: bang reshape support all dataType * fix: fix bang reduce * fix(all_reduce.cc): fix as reviewer suggessted * fix: fix style and restore unary test codes --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com> Co-authored-by: zhangyunze <z13785159769@163.com> Co-authored-by: OdinaryWord <sx-hz@163.com> Co-authored-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>	2024-01-15 11:02:13 +08:00
xgqdut2016	dda668fd16	"modified where" (#131 ) * "modified where" * "adapt int or bool condition datatype" * "add broadcast_shape.h,error" * add broadcast.h * "modified broadcast_shape.h and where.cc,.cu"	2023-09-14 10:45:57 +08:00
zhangyunze	3e6ef305f1	框架支持bert/gpt2模型构图 (#94 ) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>	2023-08-29 16:06:52 +08:00

4 Commits