Default Branch

5559536470 · add kunlun squeeze kernel (#229) · Updated 2024-04-28 11:28:28 +08:00

Branches

9b6c44dd40 · warp reduce · Updated 2024-05-11 16:53:02 +08:00

70
14

a889527aa5 · add kunlun layernorm · Updated 2024-05-11 16:24:42 +08:00

0
9

20f651b1d3 · implement instance norm in front · Updated 2024-05-08 17:44:54 +08:00

0
2

5747eb8f7d · modified format · Updated 2024-05-07 16:31:53 +08:00

0
41

7146294baa · memcopy instead of special kernel · Updated 2024-05-06 14:49:39 +08:00

3
5

b0d030d0de · [fix] fix rope op test failing · Updated 2024-04-23 13:51:10 +08:00

3
19

4a5b9572bb · add test scripts for llama2 and 9G models · Updated 2024-04-10 16:23:02 +08:00

3
17

3b7b5740af · allocate workspace from allocator for kunlun runtime · Updated 2024-04-08 15:48:06 +08:00

4
5

6c4dd7b28b · fix(front): 将stub改为可以接收GraphProto作为输入,消除分布式脚本保存额外的onnx文件, 采用int64作为index输入类型 · Updated 2024-04-07 17:15:40 +08:00

3
1

25a3cedeb0 · add pytorch bench · Updated 2024-03-21 10:27:32 +08:00

8
1

a6c919b61d · stream kernel · Updated 2024-03-07 17:01:00 +08:00

9
11

e33131ce5c · fix comment · Updated 2024-01-17 10:57:44 +08:00

25
10

3b5dd7d28c · Merge branch 'master' into update_pybind11 · Updated 2024-01-05 09:20:33 +08:00

27
2

dc6befb549 · fix: fix re-dataMalloc for weight tensor and use of naive allocator · Updated 2023-12-29 17:27:36 +08:00

35
39

a68ac10107 · Enrich dev doc · Updated 2023-12-05 17:14:28 +08:00

38
3

965df4e294 · [feature] add fused attention_kvcache operator support (#179) · Updated 2023-11-14 23:44:22 +08:00

44
0
Included

0a5d273130 · Add: print derivation steps for conv2gemm · Updated 2023-11-10 23:16:44 +08:00

45
1

295450e5f4 · Add: show conv2gemm derivation · Updated 2023-11-10 22:49:07 +08:00

99
96

9272d709da · add a simple mempory pool for allocator · Updated 2023-10-19 12:36:01 +08:00

55
1

ee6dd3deac · update test · Updated 2023-10-12 14:07:23 +08:00

59
5