Commit Graph

283 Commits

Author SHA1 Message Date
xgqdut2016 5747eb8f7d modified format 2024-05-07 16:31:53 +08:00
xgqdut2016 9384cec7de add pad2d kernel 2024-05-07 16:22:29 +08:00
Haojie Wang f0509facc6
Merge branch 'master' into ascend 2024-05-06 10:25:20 +08:00
zhangyunze 6ad05da684 fix: onnx resize op input is none bug 2024-04-30 16:11:23 +08:00
xgqdut2016 6a89946736 modiefied format, replace layernorm as instancenorm 2024-04-30 15:04:12 +08:00
xgqdut2016 0fcaf001c4 add instancenorm, use layernorm replace instance, error 2024-04-30 14:56:08 +08:00
OdinaryWord 907239cf34 fix gemm & avgpooling 2024-04-29 16:10:32 +08:00
xgqdut2016 47fc0bfa99 modified batchnorm 2024-04-28 16:34:03 +08:00
xgqdut2016 ef4646ec89 modified onnx leakyrelu alpha 2024-04-28 16:03:14 +08:00
xgqdut2016 e6b98fd652 modified format 2024-04-28 15:02:14 +08:00
xgqdut2016 4d078967e0 add leakyRelu op 2024-04-28 14:53:05 +08:00
zhangyue 5559536470
add kunlun squeeze kernel (#229)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-28 11:28:28 +08:00
Bolun Zhang fac28c25f6
添加 MLU 平台分布式验收脚本 (#223)
* 添加 MLU 平台分布式验收脚本

* add fp16 test, fix cast

* fix

* add onnxsim for llama

* add matmul tf32 for mlu

* add submodule: onnxsim_large_model

* fix

* modified bang_launch.py, start_single

* add test for albert/opt

* change file path

---------

Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
2024-04-28 11:24:09 +08:00
OdinaryWord 0c94b75a65 add gemm 2024-04-26 16:59:39 +08:00
OdinaryWord 775ce5040d format 2024-04-26 16:01:59 +08:00
OdinaryWord 6ba1a0648a add layernorm 2024-04-26 15:25:41 +08:00
OdinaryWord a765cd2a3d Merge branch 'ascend' of github.com:InfiniTensor/InfiniTensor into ascend 2024-04-25 17:28:18 +08:00
OdinaryWord 8b8f165158 add depthTospace&&resize 2024-04-25 17:24:33 +08:00
zhangyue 985d0dee5f
Kunlun dist op (#225)
* kunlun dist inference fix

* kunlun distributed

* 添加昆仑芯分布式脚本以及解决运行llama遇到的问题

* set -j8

* format

* move run_pytorch.py int o cuda/

* update notes

---------

Co-authored-by: weijie01 <weijie01@baidu.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-23 15:46:25 +08:00
OdinaryWord 5b89c699dc style: fix format 2024-04-10 17:52:23 +08:00
OdinaryWord 2b8823515e style: fix format 2024-04-10 17:52:23 +08:00
OdinaryWord 87f975d969 ascend commit 0410 2024-04-10 16:47:31 +08:00
OdinaryWord 33e1521754 fix 2024-04-10 15:40:30 +08:00
OdinaryWord ec549d260b add communication operator 2024-04-10 15:13:15 +08:00
PanZezhong1725 d1de3ab5c2
feat(dist):分布式脚本支持混合精度 (#226) 2024-04-07 16:57:07 +08:00
Hardy eafbff6cf9
Support kunlun new toolkit (#224)
Co-authored-by: wanghailu <wanghailu0717@163.com>
2024-04-03 09:56:52 +08:00
OdinaryWord dddb40cd93 add conv_transpose 2024-04-02 16:38:40 +08:00
OdinaryWord a5ccf06551 add conv_transpose&&native maxpooling 2024-04-01 16:01:36 +08:00
PanZezhong1725 7f6aec6c17
针对bert和gpt2模型分布式推理的优化 (#221)
* fix(dist): 改善分布式脚本,只打印绝对误差

* feat(dist): 增加可导出onnx的pytorch运行脚本

* feat(front): 增加对Y值为-inf的where算子的图优化

* feat(kernel): 对b为常数的pow和div算子进行特判优化

* fix(front): 消除前端对global output形状信息的依赖,分布式脚本删除不必要的shape infer

* feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化

* fix(kernel): 删除div pow const中不必要的同步

* Update expand.cu

* fix: fix comments

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Derui Yang <ydrml@hotmail.com>
2024-04-01 14:04:28 +08:00
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
Chenjie Duan 54a35772fb
feature: add parameter to config matmul compute type (#218)
* feature: add parameter to config matmul compute type

* fix format
2024-03-26 09:00:45 +08:00
OdinaryWord fc4b62a88c add maxpooling & flatten 2024-03-13 17:25:15 +08:00
OdinaryWord 36e0840f2f support for llama 2024-02-29 14:29:28 +08:00
zhangyue 00e6cc2587
XCCL support (#171)
* add reduce_mean and gather

* fix format

* add kunlun allreduce and cmakefile

* add kunlun allreduce and cmakefile

* deltete cmake opt

* fix format

* fix makefile

* add DIST option in Makefile

* add xpu allgather

* delete xpu_wait()

* add xpu allgather

* delete specific compiler

* fix format

* fix gather

* add broadcast

* fix format

* fix

* fix xpu, add where operation, fix element-wise operation

* fix softmax

* fix softmax

* log internal input and output

* fix kunlun gather bugs

* update CMakeList.txt and Makefile

* fix some kunlun kernels

* fix Makefile

* fix Makefile

* set cmake version 3.12

* format

* fix where, gather and support gpt2

* "fix format"

* fix format

* copy onnx.py from master

* use KUNLUN_HOME instead of absolute path

* fix torchvision models

* support torchvison model-zoo

* fix format

* format fix, CMakeList fix

* fix review

* fix vecToString return value

* fix format

* delete  empty file

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-29 11:48:35 +08:00
baominghelly b51ccae3b2
fix broken link in docs (#216)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-21 14:03:20 +08:00
xiaonans 1c08ba200c
[feature] add cudagraph support (#215)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test
2024-02-21 14:00:25 +08:00
xiaonans 900d8e58e3
Rope and silu (#214)
添加silu和rotary embedding算子的支持。
2024-02-04 11:05:27 +08:00
xiaonans b0876a13ce
Merge branch 'master' into rope_and_silu 2024-02-04 10:57:36 +08:00
xiaonans ae9f61de5a add comment for rope operator 2024-02-04 10:57:01 +08:00
xiaonans 9a3c0f11f6 add test for rotary embedding cuda kernel 2024-02-04 10:24:20 +08:00
zhangyunze 67b2bcb7d5
fix mlu some kernel registration & gather op (#210)
* fix: fix bang build/kernel registration | test_onnx

* delete assert float

* fix gather

* fix CMakeLists and Reshape

* fix cncl ops

* add hardsigmoid/hardswish

* fix

* add invalid datatype exception

* fix gather

* fix gather indices type

* fix gather/prelu/hardsigmoid on mlu

* fix format

* fix

---------

Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-02-01 15:02:02 +08:00
xiaonans 956ce37458 add unittest of silu kernel 2024-01-30 10:40:13 +08:00
zhangyunze 4813204a36
feat: add reshape/identity/squeeze/flatten/unsqueeze op cpu kernel (#213) 2024-01-30 10:29:59 +08:00
OdinaryWord 9db6703b58 add reshape 2024-01-29 15:07:49 +08:00
OdinaryWord e7d34badfb fix format 2024-01-26 16:11:30 +08:00
OdinaryWord f6176124ec add softmax/element_wise kernel 2024-01-26 15:40:21 +08:00
xiaonans 030e5ca9c1 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into rope_and_silu 2024-01-26 10:16:18 +08:00
xiaonans e8d111ef5d add rope and silu support 2024-01-26 10:01:27 +08:00
xiaonans d1a90ba3e2
[feature] support kvcache with static graph (#209)
* [feature] support kvcache with static graph

* use workspace to optimize kvcache attention

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-25 14:20:43 +08:00
xiaonans afed5d3c3d use workspace to optimize kvcache attention 2024-01-25 10:33:01 +08:00