zhangyue
a889527aa5
add kunlun layernorm
2024-05-11 16:24:42 +08:00
Zhang Bolun
2acb680c64
fix: format
2024-05-07 09:42:04 +08:00
Zhang Bolun
5862671c0c
fix: add comments
2024-05-06 17:01:51 +08:00
Zhang Bolun
917e82e90c
feat: 寒武纪上添加 resize 算子,修复 format
2024-05-06 16:45:01 +08:00
zhangyunze
d1799b67a3
fix: onnx resize op input is none bug
2024-04-30 10:56:00 +08:00
weijie01
36baae7615
feat: kunlun 上添加LeakyRelu,修复BatchNorm中维度为4的限制,跑通bgan
2024-04-30 10:54:30 +08:00
Zhang Bolun
23b1612192
fix: mlu 上添加 LeakyRelu,修复 BatchNorm 中维度为 4 的限制,跑通 BGAN
2024-04-30 10:54:30 +08:00
zhangyunze
77fd137dcb
fix: support batchnorm cudnn 2 dimension input
2024-04-30 10:54:30 +08:00
zhangyunze
c6de91ee82
feat: support leaky_relu op
2024-04-30 10:54:30 +08:00
zhangyue
5559536470
add kunlun squeeze kernel ( #229 )
...
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-28 11:28:28 +08:00
Bolun Zhang
fac28c25f6
添加 MLU 平台分布式验收脚本 ( #223 )
...
* 添加 MLU 平台分布式验收脚本
* add fp16 test, fix cast
* fix
* add onnxsim for llama
* add matmul tf32 for mlu
* add submodule: onnxsim_large_model
* fix
* modified bang_launch.py, start_single
* add test for albert/opt
* change file path
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
2024-04-28 11:24:09 +08:00
zhangyue
985d0dee5f
Kunlun dist op ( #225 )
...
* kunlun dist inference fix
* kunlun distributed
* 添加昆仑芯分布式脚本以及解决运行llama遇到的问题
* set -j8
* format
* move run_pytorch.py int o cuda/
* update notes
---------
Co-authored-by: weijie01 <weijie01@baidu.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-23 15:46:25 +08:00
PanZezhong1725
d1de3ab5c2
feat(dist):分布式脚本支持混合精度 ( #226 )
2024-04-07 16:57:07 +08:00
Hardy
eafbff6cf9
Support kunlun new toolkit ( #224 )
...
Co-authored-by: wanghailu <wanghailu0717@163.com>
2024-04-03 09:56:52 +08:00
PanZezhong1725
7f6aec6c17
针对bert和gpt2模型分布式推理的优化 ( #221 )
...
* fix(dist): 改善分布式脚本,只打印绝对误差
* feat(dist): 增加可导出onnx的pytorch运行脚本
* feat(front): 增加对Y值为-inf的where算子的图优化
* feat(kernel): 对b为常数的pow和div算子进行特判优化
* fix(front): 消除前端对global output形状信息的依赖,分布式脚本删除不必要的shape infer
* feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化
* fix(kernel): 删除div pow const中不必要的同步
* Update expand.cu
* fix: fix comments
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Derui Yang <ydrml@hotmail.com>
2024-04-01 14:04:28 +08:00
xiaonans
a98573990b
Accelerate llama ( #219 )
...
* [feature] add cudagraph support
* modify code to pass the cuda_all_reduce test
* modify rope op
* support rmsnorm
* add fp16 support to silu cuda op
* fix bugs in rmsnorm op
* uncomment simplify in onnx.py
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
Chenjie Duan
54a35772fb
feature: add parameter to config matmul compute type ( #218 )
...
* feature: add parameter to config matmul compute type
* fix format
2024-03-26 09:00:45 +08:00
zhangyue
00e6cc2587
XCCL support ( #171 )
...
* add reduce_mean and gather
* fix format
* add kunlun allreduce and cmakefile
* add kunlun allreduce and cmakefile
* deltete cmake opt
* fix format
* fix makefile
* add DIST option in Makefile
* add xpu allgather
* delete xpu_wait()
* add xpu allgather
* delete specific compiler
* fix format
* fix gather
* add broadcast
* fix format
* fix
* fix xpu, add where operation, fix element-wise operation
* fix softmax
* fix softmax
* log internal input and output
* fix kunlun gather bugs
* update CMakeList.txt and Makefile
* fix some kunlun kernels
* fix Makefile
* fix Makefile
* set cmake version 3.12
* format
* fix where, gather and support gpt2
* "fix format"
* fix format
* copy onnx.py from master
* use KUNLUN_HOME instead of absolute path
* fix torchvision models
* support torchvison model-zoo
* fix format
* format fix, CMakeList fix
* fix review
* fix vecToString return value
* fix format
* delete empty file
---------
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-29 11:48:35 +08:00
baominghelly
b51ccae3b2
fix broken link in docs ( #216 )
...
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-02-21 14:03:20 +08:00
xiaonans
1c08ba200c
[feature] add cudagraph support ( #215 )
...
* [feature] add cudagraph support
* modify code to pass the cuda_all_reduce test
2024-02-21 14:00:25 +08:00
xiaonans
900d8e58e3
Rope and silu ( #214 )
...
添加silu和rotary embedding算子的支持。
2024-02-04 11:05:27 +08:00
xiaonans
b0876a13ce
Merge branch 'master' into rope_and_silu
2024-02-04 10:57:36 +08:00
xiaonans
ae9f61de5a
add comment for rope operator
2024-02-04 10:57:01 +08:00
xiaonans
9a3c0f11f6
add test for rotary embedding cuda kernel
2024-02-04 10:24:20 +08:00
zhangyunze
67b2bcb7d5
fix mlu some kernel registration & gather op ( #210 )
...
* fix: fix bang build/kernel registration | test_onnx
* delete assert float
* fix gather
* fix CMakeLists and Reshape
* fix cncl ops
* add hardsigmoid/hardswish
* fix
* add invalid datatype exception
* fix gather
* fix gather indices type
* fix gather/prelu/hardsigmoid on mlu
* fix format
* fix
---------
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-02-01 15:02:02 +08:00
xiaonans
956ce37458
add unittest of silu kernel
2024-01-30 10:40:13 +08:00
zhangyunze
4813204a36
feat: add reshape/identity/squeeze/flatten/unsqueeze op cpu kernel ( #213 )
2024-01-30 10:29:59 +08:00
xiaonans
030e5ca9c1
Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into rope_and_silu
2024-01-26 10:16:18 +08:00
xiaonans
e8d111ef5d
add rope and silu support
2024-01-26 10:01:27 +08:00
xiaonans
d1a90ba3e2
[feature] support kvcache with static graph ( #209 )
...
* [feature] support kvcache with static graph
* use workspace to optimize kvcache attention
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-25 14:20:43 +08:00
xiaonans
afed5d3c3d
use workspace to optimize kvcache attention
2024-01-25 10:33:01 +08:00
Haojie Wang
a5062f3f89
Update README.md
2024-01-24 22:16:48 +08:00
Hardy
09b2ecf98a
support more data type on mlu ( #211 )
...
* support more data type
* clang format
* fix little bug
* fix cncl datatype
* fix format
---------
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-01-24 13:33:33 +08:00
xiaonans
6a1bfd6c45
[feature] support kvcache with static graph
2024-01-17 11:38:44 +08:00
Chenjie Duan
51086d2b8d
Modify kernel registration & support fp16 ( #205 )
...
* - Remove dataType from the kernel registration.
* - support fp16 for conv
* - cpu kernel: adapt the new registration mechanism
* modified all register kernel
* add where fp16
* add layernorm fp16
* add split_concat fp16
* - element_wise support fp16
* feat: support transpose fp16
* feat: support sliceOp fp16
* - unary support fp16
* - feat: support reduceOp fp16
* feat: support matmulOp/expandOp fp16
* feat: support powOp int8
* add cuda cast & support half-precision for gather
* style: fix style
* feat:support int8 for gather
* style:fix style
* modified test_cuda_conv_transposed
* fix: fix dist code to support fp16
* fix(graph.cc): fix topo_sort
* fix: fix recv and send kernel registration
* feat: add field tensors for stub
* refactor(frontend): 先排序后构图
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 为中间结果提供tensor到node的mapping
* fix (slice): add guard for area out of range
* fix: fix matmul fp16
* fix: fix re-dataMalloc for weight tensor and use of naive allocator
* feat: add dataType filter for cuda kernel
* feat: bang kernel adapt the new registration mechanism
* fix: fix some error on mlu
* feat: intelcpu kernel adapt the new registration mechanism
* feat: modify kernel registration on kunlun
* fix intelcpu compiler bug
* feat: bang reshape support all dataType
* fix: fix bang reduce
* fix(all_reduce.cc): fix as reviewer suggessted
* fix: fix style and restore unary test codes
---------
Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
zhangyunze
58993d4339
解除前端对onnx infershape功能的依赖 ( #206 )
...
* feat: SqueezeOp lift the dependency of onnx infershape.
* feat: UnsqueezeOp lift the dependency of onnx infershape.
* feat: lift the dependency of onnx infershape
* fix: fix Makefile off nccl
2024-01-12 14:54:27 +08:00
PanZezhong1725
46e61a5bd4
修正Slice内存越界问题 ( #204 )
...
fix (slice): add guard for area out of range
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-05 09:19:50 +08:00
zhangyunze
b15c4979fa
fix Issue-189 question 1-15 ( #195 )
...
* fix: fix nativecpu elementwise only support 4d tensor
* fix format
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-01-05 08:40:18 +08:00
Hardy
42032356fb
Bang cncl ( #163 )
...
* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* modify launch script
* fix name
* fix format
* fix gather
* format python script
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
2024-01-03 13:28:03 +08:00
Chenjie Duan
83f1de93d0
add frontend resize kernel ( #194 )
...
* - add frontend resize kernel
* - fix resize test
* - fix bug
- add onnx test for resize
* fix: modify codes as reviewer suggested
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-29 13:32:56 +08:00
zhangyunze
3967b437c8
fix Issue 187 split infershape wrong ( #197 )
...
* fix: fix splitOp to support unequal portions
* fix: fix as review comment
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 21:39:24 +08:00
Chenjie Duan
6e7bd6ca0c
fix(perf.py): change NNmodel commit to fix perf.py ( #203 )
2023-12-28 21:31:39 +08:00
Hardy
5ac0ab442f
Fix bang ( #198 )
...
* fix bang batchnorm
* fix pooling test bang
* add test batchnorm
* HIGH PRECISION ACTIVATION
* fix pooling
* fix matmul
* fix test
* add layernorm
* fix softmax
* fix
* better code
* fix
* fix worlflow
* fix workflow
* fix
* fix
* fxi matmul
* add LRN
* fix lrn
* fix lrn
---------
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Baoming Li <1508269885@qq.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 13:44:10 +08:00
Chenjie Duan
3f34372012
- modify error info when kernel not found ( #191 )
...
* - modify error info when kernel not found
* - modify code as reviewer suggested
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-27 09:43:57 +08:00
learner2468
9a9587556c
Add examples: inference of Paddle models ( #192 )
...
* Add paddle model and infer with InfiniTensor
* Remove unused import
---------
Co-authored-by: kilinchange <44265800+kilinchange@users.noreply.github.com>
【Hackathon No.106】Add paddle model and infer with InfiniTensor
2023-12-14 19:42:43 +08:00
xgqdut2016
a3929c25f8
Add send and recv operators based on NCCL ( #182 )
...
* baseline sendrecv, bug
* success sendrecv
* get rank from comm
* set output shape
* successful:set output shape equal to input shape
* shape as attribute
* success:shape as attribute
* success send recv, output 0
* add onnx test
* split send and recv
* success split send and recv
* test-onnx bug
* success test-onnx
* modified onnx.py
* solve review
2023-12-14 16:38:03 +08:00
Derui Yang
c143eebdf7
不依赖 onnx models 的模型存储 ( #196 )
...
Signed-off-by: YdrMaster <ydrml@hotmail.com>
2023-12-11 10:44:06 +08:00
Hardy
67974aee8a
Fix https://github.com/InfiniTensor/InfiniTensor/pull/160 ( #185 )
...
Co-authored-by: wanghailu <wanghailu0717@163.com>
2023-11-27 14:18:12 +08:00
Hardy
3ead20a23a
Fix workspace & bang conv ( #183 )
...
* fix bang workspace
* fix convbpdata
* fix code
* add code
* fix
* fix
* fix conv
* fix test conv
---------
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-11-24 15:16:25 +08:00
xgqdut2016
a7293c12ba
Add layer normalization ( #181 )
...
* - add layernorm kernel
* success:add layernorm kernel and test
* fix: remove unusalble comments
* fix: modify code as reviewer suggested
* debug,modified .cu and test
* optional bias support
* overloading function
* fix bug after merging; remove time constrain in conv test
---------
Co-authored-by: kilinchange <kilinchange@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-11-24 15:15:14 +08:00