InfiniTensor

History

Chenjie Duan 51086d2b8d Modify kernel registration & support fp16 (#205 ) * - Remove dataType from the kernel registration. * - support fp16 for conv * - cpu kernel: adapt the new registration mechanism * modified all register kernel * add where fp16 * add layernorm fp16 * add split_concat fp16 * - element_wise support fp16 * feat: support transpose fp16 * feat: support sliceOp fp16 * - unary support fp16 * - feat: support reduceOp fp16 * feat: support matmulOp/expandOp fp16 * feat: support powOp int8 * add cuda cast & support half-precision for gather * style: fix style * feat:support int8 for gather * style:fix style * modified test_cuda_conv_transposed * fix: fix dist code to support fp16 * fix(graph.cc): fix topo_sort * fix: fix recv and send kernel registration * feat: add field tensors for stub * refactor(frontend): 先排序后构图 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 为中间结果提供tensor到node的mapping * fix (slice): add guard for area out of range * fix: fix matmul fp16 * fix: fix re-dataMalloc for weight tensor and use of naive allocator * feat: add dataType filter for cuda kernel * feat: bang kernel adapt the new registration mechanism * fix: fix some error on mlu * feat: intelcpu kernel adapt the new registration mechanism * feat: modify kernel registration on kunlun * fix intelcpu compiler bug * feat: bang reshape support all dataType * fix: fix bang reduce * fix(all_reduce.cc): fix as reviewer suggessted * fix: fix style and restore unary test codes --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com> Co-authored-by: zhangyunze <z13785159769@163.com> Co-authored-by: OdinaryWord <sx-hz@163.com> Co-authored-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>		2024-01-15 11:02:13 +08:00
..
blob.h	Support bang c kernel wanghailu 0927 (#43 )	2022-09-30 11:01:52 +08:00
common.h	fix tensor parallel for llama (#159 )	2023-10-30 15:04:16 +08:00
communicator.h	impl distributed launch with NCCL (#106 )	2023-09-05 09:47:35 +08:00
constants.h	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
data_type.h	框架支持bert/gpt2模型构图 (#94 )	2023-08-29 16:06:52 +08:00
dummy_mutator.h	Add search engine (#64 )	2023-02-12 18:27:52 +08:00
graph.h	support Dynamic tensor infer shape and fix memory pool (#176 )	2023-11-23 13:11:50 +08:00
graph_handler.h	解除前端对onnx infershape功能的依赖 (#206 )	2024-01-12 14:54:27 +08:00
graph_match.h	ADD: sub graph replacement. (#56 )	2023-04-17 13:09:07 +08:00
hash.h	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
kernel.h	Modify kernel registration & support fp16 (#205 )	2024-01-15 11:02:13 +08:00
lazy_allocator.h	support Dynamic tensor infer shape and fix memory pool (#176 )	2023-11-23 13:11:50 +08:00
mutator.h	ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61 )	2023-03-27 21:28:49 +08:00
object.h	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
op_type.h	Add send and recv operators based on NCCL (#182 )	2023-12-14 16:38:03 +08:00
operator.h	Modify kernel registration & support fp16 (#205 )	2024-01-15 11:02:13 +08:00
perf_engine.h	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
ref.h	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
runtime.h	Xpu (#82 )	2023-10-16 10:57:08 +08:00
search_engine.h	Add search engine (#64 )	2023-02-12 18:27:52 +08:00
tensor.h	Modify kernel registration & support fp16 (#205 )	2024-01-15 11:02:13 +08:00
tensor_base.h	Copyout numpy接口 (#135 )	2023-09-15 16:40:44 +08:00
tensor_type.h	Support kvcache (#134 )	2023-09-18 14:17:02 +08:00