InfiniTensor/include/core
Chenjie Duan 51086d2b8d
Modify kernel registration & support fp16 (#205)
* - Remove dataType from the kernel registration.

* - support fp16 for conv

* - cpu kernel: adapt the new registration mechanism

* modified all register kernel

* add where fp16

* add layernorm fp16

* add split_concat fp16

* - element_wise support fp16

* feat: support transpose fp16

* feat: support sliceOp fp16

* - unary support fp16

* - feat: support reduceOp fp16

* feat: support matmulOp/expandOp fp16

* feat: support powOp int8

* add cuda cast & support half-precision for gather

* style: fix style

* feat:support int8 for gather

* style:fix style

* modified test_cuda_conv_transposed

* fix: fix dist code to support fp16

* fix(graph.cc): fix topo_sort

* fix: fix recv and send kernel registration

* feat: add field tensors for stub

* refactor(frontend): 先排序后构图

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 为中间结果提供tensor到node的mapping

* fix (slice): add guard for area out of range

* fix: fix matmul fp16

* fix: fix re-dataMalloc for weight tensor and use of naive allocator

* feat: add dataType filter for cuda kernel

* feat: bang kernel adapt the new registration mechanism

* fix: fix some error on mlu

* feat: intelcpu kernel adapt the new registration mechanism

* feat: modify kernel registration on kunlun

* fix intelcpu compiler bug

* feat: bang reshape support all dataType

* fix: fix bang reduce

* fix(all_reduce.cc): fix as reviewer suggessted

* fix: fix style and restore unary test codes

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
..
blob.h Support bang c kernel wanghailu 0927 (#43) 2022-09-30 11:01:52 +08:00
common.h fix tensor parallel for llama (#159) 2023-10-30 15:04:16 +08:00
communicator.h impl distributed launch with NCCL (#106) 2023-09-05 09:47:35 +08:00
constants.h Dev for 202303ddl (#66) 2023-04-18 15:10:33 +08:00
data_type.h 框架支持bert/gpt2模型构图 (#94) 2023-08-29 16:06:52 +08:00
dummy_mutator.h Add search engine (#64) 2023-02-12 18:27:52 +08:00
graph.h support Dynamic tensor infer shape and fix memory pool (#176) 2023-11-23 13:11:50 +08:00
graph_handler.h 解除前端对onnx infershape功能的依赖 (#206) 2024-01-12 14:54:27 +08:00
graph_match.h ADD: sub graph replacement. (#56) 2023-04-17 13:09:07 +08:00
hash.h Dev for 202303ddl (#66) 2023-04-18 15:10:33 +08:00
kernel.h Modify kernel registration & support fp16 (#205) 2024-01-15 11:02:13 +08:00
lazy_allocator.h support Dynamic tensor infer shape and fix memory pool (#176) 2023-11-23 13:11:50 +08:00
mutator.h ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61) 2023-03-27 21:28:49 +08:00
object.h Dev for 202303ddl (#66) 2023-04-18 15:10:33 +08:00
op_type.h Add send and recv operators based on NCCL (#182) 2023-12-14 16:38:03 +08:00
operator.h Modify kernel registration & support fp16 (#205) 2024-01-15 11:02:13 +08:00
perf_engine.h Dev for 202303ddl (#66) 2023-04-18 15:10:33 +08:00
ref.h Dev for 202303ddl (#66) 2023-04-18 15:10:33 +08:00
runtime.h Xpu (#82) 2023-10-16 10:57:08 +08:00
search_engine.h Add search engine (#64) 2023-02-12 18:27:52 +08:00
tensor.h Modify kernel registration & support fp16 (#205) 2024-01-15 11:02:13 +08:00
tensor_base.h Copyout numpy接口 (#135) 2023-09-15 16:40:44 +08:00
tensor_type.h Support kvcache (#134) 2023-09-18 14:17:02 +08:00