Commit Graph

258 Commits

Author SHA1 Message Date
Haojie Wang 08c5d4ea14
Update README.md 2024-08-20 22:10:21 +08:00
Hardy 8b61b0e397
support ascend (#165)
* fix

* fix code

* fix format

* fix format

* fix

* fix

* addAbs

* more Unary

* add kernels

* fix concat&pooling test code

* add softmax/element_wise kernel

* fix format

* add reshape

* support for llama

* add maxpooling & flatten

* add conv_transpose&&native maxpooling

* add conv_transpose

* add communication operator

* fix

* style: fix format

* style: fix format

* add depthTospace&&resize

* add layernorm

* format

* add gemm

* add leakyRelu op

* modified format

* modified onnx leakyrelu alpha

* modified batchnorm

* fix gemm & avgpooling

* fix: onnx resize op input is none bug

* add instancenorm, use layernorm replace instance, error

* modiefied format, replace layernorm as instancenorm

* fix: onnx resize op input is none bug

* add pad2d kernel

* modified format

* fix op

* fix resize

* remove sync in op

* Update INSTALL_GUIDE_CN.md for ASCEND

* Update env.sh

* format

* fix test_resize

* fix resize

* fix test_resize_

* fix test_resize_

* add HcclCommDestroy && use default context

* install onnxtuntime

* install onnx-simplifier

* install numpy

* fix bug after merge

* remove CHECK_RET&LOG_PRINT

* fix test_ascend_layernorm

* fix test_cuda_resize

* fix test_ascend_*

* fix format

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: Songxin <sx-hz@hotmail.com>
Co-authored-by: zhangyue <138768300+zhangyue207@users.noreply.github.com>
Co-authored-by: zhangyue <zhangyue@qiyuanlab.com>
Co-authored-by: sx941227 <14507528+sx941227@user.noreply.gitee.com>
Co-authored-by: zhangyunze <zhangyunze@qiyuanlab.com>
Co-authored-by: Chenjie Duan <44265800+kilinchange@users.noreply.github.com>
2024-08-20 22:09:33 +08:00
Ziminli 9c0749d1e6
Accommodate avg pool1 d (#253)
* Accommodate 1D average pooling operator

* Fixed pooling mode selection

* Changed pooling parameter initialization and infershape

* add const modifier to NCHWRS parameters

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-08-20 20:11:22 +08:00
Ji Yiming 2e6c00cb05
add choose kernel strategy (#241)
* add kernel choose strategy

* 在Kernel基类实现选择策略
2024-08-13 11:12:00 +08:00
Chenjie Duan 8ead940603
fix rope test (#250) 2024-08-07 10:05:10 +08:00
zhangyunze b9699b0e7a
fix elu commit add can not build problem (#239) 2024-07-12 10:25:30 +08:00
sunjinge d0cfd1e40a
Add ELU operator (#237)
* Add ELU operator

* Format code using clang-format

* Format code using clang-format

* Format code using clang-format

* Format code using clang-format

* fix test

* build.yml

* Update cuda_unary.h

* Update unary.h

* Update unary.cc

* Update unary.cc

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-07-07 17:35:03 +08:00
zhangyunze 62b8866022
修改适配部分模型需解决的问题 (#230)
* feat: support leaky_relu op

* fix: support batchnorm cudnn 2 dimension input

* fix: mlu 上添加 LeakyRelu,修复 BatchNorm 中维度为 4 的限制,跑通 BGAN

* feat: kunlun 上添加LeakyRelu,修复BatchNorm中维度为4的限制,跑通bgan

* fix: onnx resize op input is none bug

* feat: 寒武纪上添加 resize 算子,修复 format

* fix: add comments

* fix: format

* add kunlun layernorm

* fix:修复kunlun layernorm算子不支持3维(hack)

* fix conflicts

* code format

---------

Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
Co-authored-by: weijie01 <weijie01@baidu.com>
Co-authored-by: zhangyue <zhangyue@qiyuanlab.com>
Co-authored-by: whjthu <haojie0429@gmail.com>
2024-07-01 14:54:17 +08:00
Jiacheng Huang 26a57ee473
Add `Conv3d` operator and its naive CPU kernel implementation (#236) 2024-06-26 16:07:43 +08:00
Jiacheng Huang 3b0d7beb51 Move the `Conv3d` CPU test to `test/kernels/nativecpu` 2024-06-26 15:18:28 +08:00
Jiacheng Huang 610a6dcf53 Merge branch 'dev-conv3d' of https://github.com/InfiniTensor/InfiniTensor into dev-conv3d 2024-06-25 16:37:05 +08:00
Jiacheng Huang 627fb3a063 Add cuDNN implementation of `Conv3d` 2024-06-25 16:28:38 +08:00
Haojie Wang 76c7d57536
Merge branch 'master' into dev-conv3d 2024-06-25 14:57:53 +08:00
Jiacheng Huang ca151fa727 Add `Conv3d` operator and its naive CPU kernel implementation 2024-06-25 14:41:21 +08:00
baominghelly a0f9522a41
add leakyRelu Op (#233)
* add leakyRelu Op

* Fix clang-format issue

* fix clang format issue

* Remove unused empty line

* install dependencies to avoid git test onnx failure

* remove extra space line

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-06-24 08:07:02 +08:00
crapromer 51d3c9598e
Fix compilation failure with TEST=OFF (#234)
* fix:test off compilation failure

* delete #include <variant> in graph_handler.cc
2024-06-12 11:12:49 +08:00
zhangyue 5559536470
add kunlun squeeze kernel (#229)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-28 11:28:28 +08:00
Bolun Zhang fac28c25f6
添加 MLU 平台分布式验收脚本 (#223)
* 添加 MLU 平台分布式验收脚本

* add fp16 test, fix cast

* fix

* add onnxsim for llama

* add matmul tf32 for mlu

* add submodule: onnxsim_large_model

* fix

* modified bang_launch.py, start_single

* add test for albert/opt

* change file path

---------

Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
2024-04-28 11:24:09 +08:00
zhangyue 985d0dee5f
Kunlun dist op (#225)
* kunlun dist inference fix

* kunlun distributed

* 添加昆仑芯分布式脚本以及解决运行llama遇到的问题

* set -j8

* format

* move run_pytorch.py int o cuda/

* update notes

---------

Co-authored-by: weijie01 <weijie01@baidu.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-23 15:46:25 +08:00
PanZezhong1725 d1de3ab5c2
feat(dist):分布式脚本支持混合精度 (#226) 2024-04-07 16:57:07 +08:00
Hardy eafbff6cf9
Support kunlun new toolkit (#224)
Co-authored-by: wanghailu <wanghailu0717@163.com>
2024-04-03 09:56:52 +08:00
PanZezhong1725 7f6aec6c17
针对bert和gpt2模型分布式推理的优化 (#221)
* fix(dist): 改善分布式脚本,只打印绝对误差

* feat(dist): 增加可导出onnx的pytorch运行脚本

* feat(front): 增加对Y值为-inf的where算子的图优化

* feat(kernel): 对b为常数的pow和div算子进行特判优化

* fix(front): 消除前端对global output形状信息的依赖,分布式脚本删除不必要的shape infer

* feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化

* fix(kernel): 删除div pow const中不必要的同步

* Update expand.cu

* fix: fix comments

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Derui Yang <ydrml@hotmail.com>
2024-04-01 14:04:28 +08:00
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
Chenjie Duan 54a35772fb
feature: add parameter to config matmul compute type (#218)
* feature: add parameter to config matmul compute type

* fix format
2024-03-26 09:00:45 +08:00
zhangyue 00e6cc2587
XCCL support (#171)
* add reduce_mean and gather

* fix format

* add kunlun allreduce and cmakefile

* add kunlun allreduce and cmakefile

* deltete cmake opt

* fix format

* fix makefile

* add DIST option in Makefile

* add xpu allgather

* delete xpu_wait()

* add xpu allgather

* delete specific compiler

* fix format

* fix gather

* add broadcast

* fix format

* fix

* fix xpu, add where operation, fix element-wise operation

* fix softmax

* fix softmax

* log internal input and output

* fix kunlun gather bugs

* update CMakeList.txt and Makefile

* fix some kunlun kernels

* fix Makefile

* fix Makefile

* set cmake version 3.12

* format

* fix where, gather and support gpt2

* "fix format"

* fix format

* copy onnx.py from master

* use KUNLUN_HOME instead of absolute path

* fix torchvision models

* support torchvison model-zoo

* fix format

* format fix, CMakeList fix

* fix review

* fix vecToString return value

* fix format

* delete  empty file

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-29 11:48:35 +08:00
baominghelly b51ccae3b2
fix broken link in docs (#216)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-21 14:03:20 +08:00
xiaonans 1c08ba200c
[feature] add cudagraph support (#215)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test
2024-02-21 14:00:25 +08:00
xiaonans 900d8e58e3
Rope and silu (#214)
添加silu和rotary embedding算子的支持。
2024-02-04 11:05:27 +08:00
xiaonans b0876a13ce
Merge branch 'master' into rope_and_silu 2024-02-04 10:57:36 +08:00
xiaonans ae9f61de5a add comment for rope operator 2024-02-04 10:57:01 +08:00
xiaonans 9a3c0f11f6 add test for rotary embedding cuda kernel 2024-02-04 10:24:20 +08:00
zhangyunze 67b2bcb7d5
fix mlu some kernel registration & gather op (#210)
* fix: fix bang build/kernel registration | test_onnx

* delete assert float

* fix gather

* fix CMakeLists and Reshape

* fix cncl ops

* add hardsigmoid/hardswish

* fix

* add invalid datatype exception

* fix gather

* fix gather indices type

* fix gather/prelu/hardsigmoid on mlu

* fix format

* fix

---------

Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-02-01 15:02:02 +08:00
xiaonans 956ce37458 add unittest of silu kernel 2024-01-30 10:40:13 +08:00
zhangyunze 4813204a36
feat: add reshape/identity/squeeze/flatten/unsqueeze op cpu kernel (#213) 2024-01-30 10:29:59 +08:00
xiaonans 030e5ca9c1 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into rope_and_silu 2024-01-26 10:16:18 +08:00
xiaonans e8d111ef5d add rope and silu support 2024-01-26 10:01:27 +08:00
xiaonans d1a90ba3e2
[feature] support kvcache with static graph (#209)
* [feature] support kvcache with static graph

* use workspace to optimize kvcache attention

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-25 14:20:43 +08:00
xiaonans afed5d3c3d use workspace to optimize kvcache attention 2024-01-25 10:33:01 +08:00
Haojie Wang a5062f3f89
Update README.md 2024-01-24 22:16:48 +08:00
Hardy 09b2ecf98a
support more data type on mlu (#211)
* support more data type

* clang format

* fix little bug

* fix cncl datatype

* fix format

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-01-24 13:33:33 +08:00
xiaonans 6a1bfd6c45 [feature] support kvcache with static graph 2024-01-17 11:38:44 +08:00
Chenjie Duan 51086d2b8d
Modify kernel registration & support fp16 (#205)
* - Remove dataType from the kernel registration.

* - support fp16 for conv

* - cpu kernel: adapt the new registration mechanism

* modified all register kernel

* add where fp16

* add layernorm fp16

* add split_concat fp16

* - element_wise support fp16

* feat: support transpose fp16

* feat: support sliceOp fp16

* - unary support fp16

* - feat: support reduceOp fp16

* feat: support matmulOp/expandOp fp16

* feat: support powOp int8

* add cuda cast & support half-precision for gather

* style: fix style

* feat:support int8 for gather

* style:fix style

* modified test_cuda_conv_transposed

* fix: fix dist code to support fp16

* fix(graph.cc): fix topo_sort

* fix: fix recv and send kernel registration

* feat: add field tensors for stub

* refactor(frontend): 先排序后构图

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 为中间结果提供tensor到node的mapping

* fix (slice): add guard for area out of range

* fix: fix matmul fp16

* fix: fix re-dataMalloc for weight tensor and use of naive allocator

* feat: add dataType filter for cuda kernel

* feat: bang kernel adapt the new registration mechanism

* fix: fix some error on mlu

* feat: intelcpu kernel adapt the new registration mechanism

* feat: modify kernel registration on kunlun

* fix intelcpu compiler bug

* feat: bang reshape support all dataType

* fix: fix bang reduce

* fix(all_reduce.cc): fix as reviewer suggessted

* fix: fix style and restore unary test codes

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
zhangyunze 58993d4339
解除前端对onnx infershape功能的依赖 (#206)
* feat: SqueezeOp lift the dependency of onnx infershape.

* feat: UnsqueezeOp lift the dependency of onnx infershape.

* feat: lift the dependency of onnx infershape

* fix: fix Makefile off nccl
2024-01-12 14:54:27 +08:00
PanZezhong1725 46e61a5bd4
修正Slice内存越界问题 (#204)
fix (slice): add guard for area out of range

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-05 09:19:50 +08:00
zhangyunze b15c4979fa
fix Issue-189 question 1-15 (#195)
* fix: fix nativecpu elementwise only support 4d tensor

* fix format

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-05 08:40:18 +08:00
Hardy 42032356fb
Bang cncl (#163)
* MLU CNCL base

* add FindCNCL.cmake, not find -lcncl

* bangPrintFloat not find

* docker:make sucessful, test error

* delete net file and onnxtest.py

* init

* fix cncl

* format

* fix

* format

* fix cncl

* run dist gpt2 on mlu

* format

* fix import error on mlu docker

* run llama single card

* run distributed llama2

* add test for slice/reduce on mlu

* fix cncl related test

* fix format

* format

* delete comments

* change GPU to MLU

* MLU CNCL base

* add FindCNCL.cmake, not find -lcncl

* bangPrintFloat not find

* docker:make sucessful, test error

* delete net file and onnxtest.py

* init

* fix cncl

* format

* fix

* format

* fix cncl

* run dist gpt2 on mlu

* format

* fix import error on mlu docker

* run llama single card

* run distributed llama2

* add test for slice/reduce on mlu

* fix cncl related test

* fix format

* format

* delete comments

* change GPU to MLU

* modify launch script

* fix name

* fix format

* fix gather

* format python script

---------

Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
2024-01-03 13:28:03 +08:00
Chenjie Duan 83f1de93d0
add frontend resize kernel (#194)
* - add frontend resize kernel

* - fix resize test

* - fix bug
- add onnx test for resize

* fix: modify codes as reviewer suggested

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-29 13:32:56 +08:00
zhangyunze 3967b437c8
fix Issue 187 split infershape wrong (#197)
* fix: fix splitOp to support unequal portions

* fix: fix as review comment

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 21:39:24 +08:00
Chenjie Duan 6e7bd6ca0c
fix(perf.py): change NNmodel commit to fix perf.py (#203) 2023-12-28 21:31:39 +08:00
Hardy 5ac0ab442f
Fix bang (#198)
* fix bang batchnorm

* fix pooling test bang

* add test batchnorm

* HIGH PRECISION ACTIVATION

* fix pooling

* fix matmul

* fix test

* add layernorm

* fix softmax

* fix

* better code

* fix

* fix worlflow

* fix workflow

* fix

* fix

* fxi matmul

* add LRN

* fix lrn

* fix lrn

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Baoming Li <1508269885@qq.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 13:44:10 +08:00