Commit Graph

275 Commits

Author SHA1 Message Date
zhangyunze 377b3bf391 fix: onnx resize op input is none bug 2024-04-30 12:11:13 +08:00
OdinaryWord 907239cf34 fix gemm & avgpooling 2024-04-29 16:10:32 +08:00
xgqdut2016 47fc0bfa99 modified batchnorm 2024-04-28 16:34:03 +08:00
xgqdut2016 ef4646ec89 modified onnx leakyrelu alpha 2024-04-28 16:03:14 +08:00
xgqdut2016 e6b98fd652 modified format 2024-04-28 15:02:14 +08:00
xgqdut2016 4d078967e0 add leakyRelu op 2024-04-28 14:53:05 +08:00
OdinaryWord 0c94b75a65 add gemm 2024-04-26 16:59:39 +08:00
OdinaryWord 775ce5040d format 2024-04-26 16:01:59 +08:00
OdinaryWord 6ba1a0648a add layernorm 2024-04-26 15:25:41 +08:00
OdinaryWord a765cd2a3d Merge branch 'ascend' of github.com:InfiniTensor/InfiniTensor into ascend 2024-04-25 17:28:18 +08:00
OdinaryWord 8b8f165158 add depthTospace&&resize 2024-04-25 17:24:33 +08:00
OdinaryWord 5b89c699dc style: fix format 2024-04-10 17:52:23 +08:00
OdinaryWord 2b8823515e style: fix format 2024-04-10 17:52:23 +08:00
OdinaryWord 87f975d969 ascend commit 0410 2024-04-10 16:47:31 +08:00
OdinaryWord 33e1521754 fix 2024-04-10 15:40:30 +08:00
OdinaryWord ec549d260b add communication operator 2024-04-10 15:13:15 +08:00
PanZezhong1725 d1de3ab5c2
feat(dist):分布式脚本支持混合精度 (#226) 2024-04-07 16:57:07 +08:00
Hardy eafbff6cf9
Support kunlun new toolkit (#224)
Co-authored-by: wanghailu <wanghailu0717@163.com>
2024-04-03 09:56:52 +08:00
OdinaryWord dddb40cd93 add conv_transpose 2024-04-02 16:38:40 +08:00
OdinaryWord a5ccf06551 add conv_transpose&&native maxpooling 2024-04-01 16:01:36 +08:00
PanZezhong1725 7f6aec6c17
针对bert和gpt2模型分布式推理的优化 (#221)
* fix(dist): 改善分布式脚本,只打印绝对误差

* feat(dist): 增加可导出onnx的pytorch运行脚本

* feat(front): 增加对Y值为-inf的where算子的图优化

* feat(kernel): 对b为常数的pow和div算子进行特判优化

* fix(front): 消除前端对global output形状信息的依赖,分布式脚本删除不必要的shape infer

* feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化

* fix(kernel): 删除div pow const中不必要的同步

* Update expand.cu

* fix: fix comments

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Derui Yang <ydrml@hotmail.com>
2024-04-01 14:04:28 +08:00
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
Chenjie Duan 54a35772fb
feature: add parameter to config matmul compute type (#218)
* feature: add parameter to config matmul compute type

* fix format
2024-03-26 09:00:45 +08:00
OdinaryWord fc4b62a88c add maxpooling & flatten 2024-03-13 17:25:15 +08:00
OdinaryWord 36e0840f2f support for llama 2024-02-29 14:29:28 +08:00
zhangyue 00e6cc2587
XCCL support (#171)
* add reduce_mean and gather

* fix format

* add kunlun allreduce and cmakefile

* add kunlun allreduce and cmakefile

* deltete cmake opt

* fix format

* fix makefile

* add DIST option in Makefile

* add xpu allgather

* delete xpu_wait()

* add xpu allgather

* delete specific compiler

* fix format

* fix gather

* add broadcast

* fix format

* fix

* fix xpu, add where operation, fix element-wise operation

* fix softmax

* fix softmax

* log internal input and output

* fix kunlun gather bugs

* update CMakeList.txt and Makefile

* fix some kunlun kernels

* fix Makefile

* fix Makefile

* set cmake version 3.12

* format

* fix where, gather and support gpt2

* "fix format"

* fix format

* copy onnx.py from master

* use KUNLUN_HOME instead of absolute path

* fix torchvision models

* support torchvison model-zoo

* fix format

* format fix, CMakeList fix

* fix review

* fix vecToString return value

* fix format

* delete  empty file

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-29 11:48:35 +08:00
baominghelly b51ccae3b2
fix broken link in docs (#216)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-21 14:03:20 +08:00
xiaonans 1c08ba200c
[feature] add cudagraph support (#215)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test
2024-02-21 14:00:25 +08:00
xiaonans 900d8e58e3
Rope and silu (#214)
添加silu和rotary embedding算子的支持。
2024-02-04 11:05:27 +08:00
xiaonans b0876a13ce
Merge branch 'master' into rope_and_silu 2024-02-04 10:57:36 +08:00
xiaonans ae9f61de5a add comment for rope operator 2024-02-04 10:57:01 +08:00
xiaonans 9a3c0f11f6 add test for rotary embedding cuda kernel 2024-02-04 10:24:20 +08:00
zhangyunze 67b2bcb7d5
fix mlu some kernel registration & gather op (#210)
* fix: fix bang build/kernel registration | test_onnx

* delete assert float

* fix gather

* fix CMakeLists and Reshape

* fix cncl ops

* add hardsigmoid/hardswish

* fix

* add invalid datatype exception

* fix gather

* fix gather indices type

* fix gather/prelu/hardsigmoid on mlu

* fix format

* fix

---------

Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-02-01 15:02:02 +08:00
xiaonans 956ce37458 add unittest of silu kernel 2024-01-30 10:40:13 +08:00
zhangyunze 4813204a36
feat: add reshape/identity/squeeze/flatten/unsqueeze op cpu kernel (#213) 2024-01-30 10:29:59 +08:00
OdinaryWord 9db6703b58 add reshape 2024-01-29 15:07:49 +08:00
OdinaryWord e7d34badfb fix format 2024-01-26 16:11:30 +08:00
OdinaryWord f6176124ec add softmax/element_wise kernel 2024-01-26 15:40:21 +08:00
xiaonans 030e5ca9c1 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into rope_and_silu 2024-01-26 10:16:18 +08:00
xiaonans e8d111ef5d add rope and silu support 2024-01-26 10:01:27 +08:00
xiaonans d1a90ba3e2
[feature] support kvcache with static graph (#209)
* [feature] support kvcache with static graph

* use workspace to optimize kvcache attention

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-25 14:20:43 +08:00
xiaonans afed5d3c3d use workspace to optimize kvcache attention 2024-01-25 10:33:01 +08:00
Haojie Wang a5062f3f89
Update README.md 2024-01-24 22:16:48 +08:00
Hardy 09b2ecf98a
support more data type on mlu (#211)
* support more data type

* clang format

* fix little bug

* fix cncl datatype

* fix format

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-01-24 13:33:33 +08:00
OdinaryWord c970c93ba1 Merge branch 'master' into ascend 2024-01-18 15:23:47 +08:00
OdinaryWord dcbbc82d5b Merge branch 'master' into ascend 2024-01-18 15:15:55 +08:00
OdinaryWord 70950e3fbb fix concat&pooling test code 2024-01-18 14:52:36 +08:00
xiaonans 6a1bfd6c45 [feature] support kvcache with static graph 2024-01-17 11:38:44 +08:00
Chenjie Duan 51086d2b8d
Modify kernel registration & support fp16 (#205)
* - Remove dataType from the kernel registration.

* - support fp16 for conv

* - cpu kernel: adapt the new registration mechanism

* modified all register kernel

* add where fp16

* add layernorm fp16

* add split_concat fp16

* - element_wise support fp16

* feat: support transpose fp16

* feat: support sliceOp fp16

* - unary support fp16

* - feat: support reduceOp fp16

* feat: support matmulOp/expandOp fp16

* feat: support powOp int8

* add cuda cast & support half-precision for gather

* style: fix style

* feat:support int8 for gather

* style:fix style

* modified test_cuda_conv_transposed

* fix: fix dist code to support fp16

* fix(graph.cc): fix topo_sort

* fix: fix recv and send kernel registration

* feat: add field tensors for stub

* refactor(frontend): 先排序后构图

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 为中间结果提供tensor到node的mapping

* fix (slice): add guard for area out of range

* fix: fix matmul fp16

* fix: fix re-dataMalloc for weight tensor and use of naive allocator

* feat: add dataType filter for cuda kernel

* feat: bang kernel adapt the new registration mechanism

* fix: fix some error on mlu

* feat: intelcpu kernel adapt the new registration mechanism

* feat: modify kernel registration on kunlun

* fix intelcpu compiler bug

* feat: bang reshape support all dataType

* fix: fix bang reduce

* fix(all_reduce.cc): fix as reviewer suggessted

* fix: fix style and restore unary test codes

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
zhangyunze 58993d4339
解除前端对onnx infershape功能的依赖 (#206)
* feat: SqueezeOp lift the dependency of onnx infershape.

* feat: UnsqueezeOp lift the dependency of onnx infershape.

* feat: lift the dependency of onnx infershape

* fix: fix Makefile off nccl
2024-01-12 14:54:27 +08:00