PanZezhong1725
7f6aec6c17
针对bert和gpt2模型分布式推理的优化 ( #221 )
...
* fix(dist): 改善分布式脚本,只打印绝对误差
* feat(dist): 增加可导出onnx的pytorch运行脚本
* feat(front): 增加对Y值为-inf的where算子的图优化
* feat(kernel): 对b为常数的pow和div算子进行特判优化
* fix(front): 消除前端对global output形状信息的依赖,分布式脚本删除不必要的shape infer
* feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化
* fix(kernel): 删除div pow const中不必要的同步
* Update expand.cu
* fix: fix comments
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Derui Yang <ydrml@hotmail.com>
2024-04-01 14:04:28 +08:00
Chenjie Duan
51086d2b8d
Modify kernel registration & support fp16 ( #205 )
...
* - Remove dataType from the kernel registration.
* - support fp16 for conv
* - cpu kernel: adapt the new registration mechanism
* modified all register kernel
* add where fp16
* add layernorm fp16
* add split_concat fp16
* - element_wise support fp16
* feat: support transpose fp16
* feat: support sliceOp fp16
* - unary support fp16
* - feat: support reduceOp fp16
* feat: support matmulOp/expandOp fp16
* feat: support powOp int8
* add cuda cast & support half-precision for gather
* style: fix style
* feat:support int8 for gather
* style:fix style
* modified test_cuda_conv_transposed
* fix: fix dist code to support fp16
* fix(graph.cc): fix topo_sort
* fix: fix recv and send kernel registration
* feat: add field tensors for stub
* refactor(frontend): 先排序后构图
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 为中间结果提供tensor到node的mapping
* fix (slice): add guard for area out of range
* fix: fix matmul fp16
* fix: fix re-dataMalloc for weight tensor and use of naive allocator
* feat: add dataType filter for cuda kernel
* feat: bang kernel adapt the new registration mechanism
* fix: fix some error on mlu
* feat: intelcpu kernel adapt the new registration mechanism
* feat: modify kernel registration on kunlun
* fix intelcpu compiler bug
* feat: bang reshape support all dataType
* fix: fix bang reduce
* fix(all_reduce.cc): fix as reviewer suggessted
* fix: fix style and restore unary test codes
---------
Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
xgqdut2016
dda668fd16
"modified where" ( #131 )
...
* "modified where"
* "adapt int or bool condition datatype"
* "add broadcast_shape.h,error"
* add broadcast.h
* "modified broadcast_shape.h and where.cc,.cu"
2023-09-14 10:45:57 +08:00
zhangyunze
3e6ef305f1
框架支持bert/gpt2模型构图 ( #94 )
...
* feat: support to sqrt op
* feat: support to erf op
* feat: support to expand op
* feat: support to where op
* fix: gather op index can be int64_t(hard coding)
* fix: some wrong use
* style: fix the format style
* test: add test for change op
* fix: rebase to master
* fix: fix matmul b compute wrong
* add expand and where kernel
* Add int64 support for cuda gather kernel
* add test_where.cc
* add "expand.(cu/cc,test,cuda),modified where.cu"
* Separate initialization of datatypes to avoid compile error
* modify where.(cu/cc/h,test), expand and clip
* Format fix
* Format fix
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-08-29 16:06:52 +08:00