Commit Graph

8 Commits

Author SHA1 Message Date
xiaonans e8d111ef5d add rope and silu support 2024-01-26 10:01:27 +08:00
Chenjie Duan 51086d2b8d
Modify kernel registration & support fp16 (#205)
* - Remove dataType from the kernel registration.

* - support fp16 for conv

* - cpu kernel: adapt the new registration mechanism

* modified all register kernel

* add where fp16

* add layernorm fp16

* add split_concat fp16

* - element_wise support fp16

* feat: support transpose fp16

* feat: support sliceOp fp16

* - unary support fp16

* - feat: support reduceOp fp16

* feat: support matmulOp/expandOp fp16

* feat: support powOp int8

* add cuda cast & support half-precision for gather

* style: fix style

* feat:support int8 for gather

* style:fix style

* modified test_cuda_conv_transposed

* fix: fix dist code to support fp16

* fix(graph.cc): fix topo_sort

* fix: fix recv and send kernel registration

* feat: add field tensors for stub

* refactor(frontend): 先排序后构图

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 为中间结果提供tensor到node的mapping

* fix (slice): add guard for area out of range

* fix: fix matmul fp16

* fix: fix re-dataMalloc for weight tensor and use of naive allocator

* feat: add dataType filter for cuda kernel

* feat: bang kernel adapt the new registration mechanism

* fix: fix some error on mlu

* feat: intelcpu kernel adapt the new registration mechanism

* feat: modify kernel registration on kunlun

* fix intelcpu compiler bug

* feat: bang reshape support all dataType

* fix: fix bang reduce

* fix(all_reduce.cc): fix as reviewer suggessted

* fix: fix style and restore unary test codes

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
PanZezhong1725 ed3034f878
Add HardSigmoid and HardSwish (#156)
* Add HardSigmoid and HardSwish

* fix format
2023-10-10 22:41:06 +08:00
ChengXiang Qi 7f16fa353e
【Hackathon No.108】Add Gelu operator, ffi, kernel for cpu and gpu. (#148)
feat: Add Gelu kernel, operator, ffi.
2023-10-10 15:21:13 +08:00
PanZezhong1725 7600fe688c
Add Neg operator and kernel (#152)
* Add Neg operator and kernel

* handle neg in to_onnx

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-10-10 10:54:56 +08:00
PanZezhong1725 785853b0a3
Add erf kernel for cpu and gpu (#147)
Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan>
2023-10-09 09:36:55 +08:00
constroy Li 48847958d0
impl sqrt on CUDA (#109)
* impl sqrt on CUDA
fix parser of Gather and ReduceMean

* fix test_gather

* fix test_cuda_gather

* impl sqrt cpu and add sqrt to test_cuda_unary

* cuda_unary supports arbitary shapes

* fix SplitOp with dim=-1

* fix SplitOp with dim=-1
2023-08-18 12:17:47 +08:00
Hardy 6ac106cba4
Add activation operators and kernels
* add code for activation operation

* add code for activation operation on GPU

* add test code for activation operation

* add code for activation operation

* add code for activation on gpu ,use cudnn

* add code for activation on GPU use cudnn

* Chore: add constants.h and remove comments

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-16 13:58:57 +08:00