InfiniTensor

Commit Graph

Select branches

Hide Pull Requests

Conv_NHWC

NNET_231111

NNET_231111_from_master

NNET_GAN

NNET_OpSearch

NNET_anyOp

NNET_bias

NNET_bias_0630

NNET_e2e

NNET_e2e_fix

NNET_e2e_for_merge

NNET_eliminateOP

NNET_gcn

NNET_gcn_fuse

NNET_op_test

NNET_transpose

TC_revision

activation

add_paddle_model

allocator_memPool

ascend

bang-softmax

benchmark

benchmark_conv

benchmark_softmax

broadcast_wanghailu_0916

case-fsrcnn

change_path

constroy/doc_on_ares

conv_half

cpu_backend2

cuda-attention

cuda-transpose

dcj/for_multiple_datatype

ddp

dev-dynamic-graph

dev-dynamic-graph-allocator

dev-leakyrelu

dev-memory

dist/graph

dist_bench

dropout

dump/init

fsrcnn-conv-bias-act-fuse

gencode

gpt

graph-onnx

graphFactory

instance_norm

kunlun_temp

kvcache_attention_fp16

kvcache_backup

master

model_test

nnet_e2e_for_merge

op_timer

optimization-pass

point2point

power-fusion

pure_engine

search_engine

support_fp16

testAccuracy

test_codegen

test_onnx

train_wanghailu_1010

update_doc

update_pybind11

v0630

xpu_allreduce

test-models

9b6c44dd40 warp reduce cuda-attention xgqdut2016 2024-05-11 16:53:02 +0800
a889527aa5 add kunlun layernorm dev-leakyrelu zhangyue 2024-05-11 16:24:42 +0800
131d1cb6d0 sub matrix xgqdut2016 2024-05-10 16:42:30 +0800
20f651b1d3 implement instance norm in front instance_norm crapromer 2024-05-08 17:44:54 +0800
3001274969 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
5747eb8f7d modified format ascend xgqdut2016 2024-05-07 16:31:53 +0800
9384cec7de add pad2d kernel xgqdut2016 2024-05-07 16:22:29 +0800
2acb680c64 fix: format Zhang Bolun 2024-05-07 09:42:04 +0800
5862671c0c fix: add comments Zhang Bolun 2024-05-06 17:01:51 +0800
917e82e90c feat: 寒武纪上添加 resize 算子，修复 format Zhang Bolun 2024-05-06 16:45:01 +0800
7146294baa memcopy instead of special kernel cuda-transpose xgqdut2016 2024-05-06 14:49:39 +0800
f0509facc6

Merge branch 'master' into ascend Haojie Wang 2024-05-06 10:25:20 +0800
6ad05da684 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
6a89946736 modiefied format, replace layernorm as instancenorm xgqdut2016 2024-04-30 15:04:12 +0800
0fcaf001c4 add instancenorm, use layernorm replace instance, error xgqdut2016 2024-04-30 14:56:08 +0800
d1799b67a3 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
36baae7615 feat: kunlun 上添加LeakyRelu，修复BatchNorm中维度为4的限制，跑通bgan weijie01 2024-04-28 10:42:04 +0800
23b1612192 fix: mlu 上添加 LeakyRelu，修复 BatchNorm 中维度为 4 的限制，跑通 BGAN Zhang Bolun 2024-04-25 17:05:06 +0800
77fd137dcb fix: support batchnorm cudnn 2 dimension input zhangyunze 2024-04-25 16:40:15 +0800
c6de91ee82 feat: support leaky_relu op zhangyunze 2024-04-25 11:36:08 +0800
907239cf34 fix gemm & avgpooling OdinaryWord 2024-04-29 16:10:32 +0800
47fc0bfa99 modified batchnorm xgqdut2016 2024-04-28 16:34:03 +0800
ef4646ec89 modified onnx leakyrelu alpha xgqdut2016 2024-04-28 16:03:14 +0800
e6b98fd652 modified format xgqdut2016 2024-04-28 15:02:14 +0800
4d078967e0 add leakyRelu op xgqdut2016 2024-04-28 14:53:05 +0800
5559536470

add kunlun squeeze kernel (#229) master zhangyue 2024-04-28 11:28:28 +0800
fac28c25f6

添加 MLU 平台分布式验收脚本 (#223) Bolun Zhang 2024-04-28 11:24:09 +0800
0c94b75a65 add gemm OdinaryWord 2024-04-26 16:59:39 +0800
775ce5040d format OdinaryWord 2024-04-26 16:01:59 +0800
6ba1a0648a add layernorm OdinaryWord 2024-04-26 15:25:41 +0800
a765cd2a3d Merge branch 'ascend' of github.com:InfiniTensor/InfiniTensor into ascend OdinaryWord 2024-04-25 17:28:18 +0800
8b8f165158 add depthTospace&&resize OdinaryWord 2024-04-25 17:24:33 +0800
985d0dee5f

Kunlun dist op (#225) zhangyue 2024-04-23 15:46:25 +0800
b0d030d0de [fix] fix rope op test failing kvcache_backup xiaonans 2024-04-12 17:22:24 +0800
d000f9750c add shape information to the kvcache attention operator xiaonans 2024-04-11 14:52:39 +0800
5b89c699dc style: fix format OdinaryWord 2024-04-10 17:36:23 +0800
2b8823515e style: fix format OdinaryWord 2024-04-10 17:23:13 +0800
87f975d969 ascend commit 0410 OdinaryWord 2024-04-10 16:47:31 +0800
4a5b9572bb add test scripts for llama2 and 9G models kvcache_attention_fp16 xiaonans 2024-04-10 16:23:02 +0800
33e1521754 fix OdinaryWord 2024-04-10 15:40:30 +0800
ec549d260b add communication operator OdinaryWord 2024-04-10 15:13:15 +0800
73e3f1fc6f add currency operator xgqdut2016 2024-04-10 15:01:22 +0800
86133c8d0a modified expand xgqdut2016 2024-04-10 11:16:54 +0800
2761d46737 modified div_kernel xgqdut2016 2024-04-10 10:51:35 +0800
aa1c3222ed modified transpose and where xgqdut2016 2024-04-10 10:17:45 +0800
159642d6ae merge master xiaonans 2024-04-10 10:03:11 +0800
c01e64db50 rope and attention ops support multiple batchs/sequences. xiaonans 2024-04-08 12:01:08 +0800
3b7b5740af allocate workspace from allocator for kunlun runtime kunlun_temp kilinchange 2024-04-08 15:48:06 +0800
6c4dd7b28b fix(front): 将stub改为可以接收GraphProto作为输入，消除分布式脚本保存额外的onnx文件，采用int64作为index输入类型 dist/graph panzezhong 2024-04-07 17:15:40 +0800
d1de3ab5c2

feat(dist)：分布式脚本支持混合精度 (#226) PanZezhong1725 2024-04-07 16:57:07 +0800
e4387904c2

Merge branch 'master' into kunlun_dist_op Haojie Wang 2024-04-03 09:58:21 +0800
eafbff6cf9

Support kunlun new toolkit (#224) Hardy 2024-04-03 09:56:52 +0800
14a40a1967 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into kunlun_dist_op wanghailu 2024-04-03 01:01:40 +0800
32a13b7760 kunlun distributed weijie01 2024-04-02 17:15:08 +0800
dddb40cd93 add conv_transpose OdinaryWord 2024-04-02 16:38:40 +0800
a71cd14963 kunlun dist inference fix weijie01 2024-04-02 15:30:46 +0800
a5ccf06551 add conv_transpose&&native maxpooling OdinaryWord 2024-04-01 16:01:36 +0800
7f6aec6c17

针对bert和gpt2模型分布式推理的优化 (#221) PanZezhong1725 2024-04-01 14:04:28 +0800
a98573990b

Accelerate llama (#219) xiaonans 2024-04-01 08:46:05 +0800
eb3a2d123d accelerate cuda attention xiaonans 2024-03-28 09:07:30 +0800
4bdd33522b accelerate cuda fp32 matmul xiaonans 2024-03-26 11:37:54 +0800
54a35772fb

feature: add parameter to config matmul compute type (#218) Chenjie Duan 2024-03-26 09:00:45 +0800
25a3cedeb0 add pytorch bench dist_bench Bolun 2024-03-21 02:27:32 +0000
0740d26f43 clean up xiaonans 2024-03-21 10:17:06 +0800
fc3d38f80e attention support fp16 xiaonans 2024-03-20 14:56:15 +0800
d43364ac60 inter-block communication is fp16 xiaonans 2024-03-19 11:21:14 +0800
db053e32a4 kv register is fp16 xiaonans 2024-03-18 17:25:57 +0800
1e797d4ffe cache is fp16 xiaonans 2024-03-18 15:51:19 +0800
80412ae162 fix bugs when blocksize==64 xiaonans 2024-03-18 15:31:52 +0800
fc4b62a88c add maxpooling & flatten OdinaryWord 2024-03-13 17:25:15 +0800
a6c919b61d stream kernel bang-softmax xgqdut2016 2024-03-07 09:01:00 +0000
d4721cb40c modified the memory allocattion xgqdut2016 2024-03-06 02:48:45 +0000
6ace4d8ae2 modified test cnnl and bang time xgqdut2016 2024-03-05 09:00:49 +0000
9aaf313c6f modified test_bang_softmax.cc xgqdut2016 2024-02-29 07:19:36 +0000
36e0840f2f support for llama OdinaryWord 2024-02-29 14:29:28 +0800
00e6cc2587

XCCL support (#171) zhangyue 2024-02-29 11:48:35 +0800
1ed4b36db2 add bangSoftmax , compare cnnl and bang C xgqdut2016 2024-02-28 03:14:58 +0000
186a6f37f2 modified the recip xgqdut2016 2024-02-27 08:59:01 +0000
920b23cad8 Modify memory allocation method xgqdut2016 2024-02-26 02:13:04 +0000
e5d5085e6a modified tensor.h xgqdut2016 2024-02-23 02:19:30 +0000
66d98a3f04 modified kernel, add nDim parameter xgqdut2016 2024-02-22 03:11:58 +0000
ddf90fb19e

Merge branch 'master' into bang-softmax xgqdut2016 2024-02-22 10:36:39 +0800
c41ad9120d add bang softmax,output error xgqdut2016 2024-02-22 02:35:20 +0000
b51ccae3b2

fix broken link in docs (#216) baominghelly 2024-02-21 14:03:20 +0800
1c08ba200c

[feature] add cudagraph support (#215) xiaonans 2024-02-21 14:00:25 +0800
83be7fa373 fix bugs in rmsnorm op xiaonans 2024-02-20 10:50:47 +0800
0f1c04d864 add fp16 support to silu cuda op xiaonans 2024-02-19 11:39:21 +0800
936797b960 support rmsnorm xiaonans 2024-02-08 10:40:03 +0800
17bd98d453 modify rope op xiaonans 2024-02-06 17:04:05 +0800
8cc6af0a83 modify code to pass the cuda_all_reduce test xiaonans 2024-02-06 10:41:23 +0800
c04910f118 [feature] add cudagraph support xiaonans 2024-02-05 16:19:58 +0800
900d8e58e3

Rope and silu (#214) xiaonans 2024-02-04 11:05:27 +0800
b0876a13ce

Merge branch 'master' into rope_and_silu xiaonans 2024-02-04 10:57:36 +0800
ae9f61de5a add comment for rope operator xiaonans 2024-02-04 10:40:25 +0800
9a3c0f11f6 add test for rotary embedding cuda kernel xiaonans 2024-01-30 15:27:04 +0800
67b2bcb7d5

fix mlu some kernel registration & gather op (#210) zhangyunze 2024-02-01 15:02:02 +0800
956ce37458 add unittest of silu kernel xiaonans 2024-01-30 10:40:13 +0800
4813204a36

feat: add reshape/identity/squeeze/flatten/unsqueeze op cpu kernel (#213) zhangyunze 2024-01-30 10:29:59 +0800
9db6703b58 add reshape OdinaryWord 2024-01-29 15:07:49 +0800
e7d34badfb fix format OdinaryWord 2024-01-26 16:11:30 +0800