Default Branch

eafbff6cf9 · Support kunlun new toolkit (#224) · Updated 2024-04-03 09:56:52 +08:00

Branches

da2556d36e · Merge branch 'master' into dist_mlu · Updated 2024-04-03 14:27:48 +08:00

0
4

e4387904c2 · Merge branch 'master' into kunlun_dist_op · Updated 2024-04-03 09:58:21 +08:00

0
4

dddb40cd93 · add conv_transpose · Updated 2024-04-02 16:38:40 +08:00

21
20

8a6d4f4dc3 · Merge branch 'master' into cuda-optimize · Updated 2024-04-01 09:31:18 +08:00

2
5

4bdd33522b · accelerate cuda fp32 matmul · Updated 2024-03-26 11:37:54 +08:00

7
13

25a3cedeb0 · add pytorch bench · Updated 2024-03-21 10:27:32 +08:00

4
1

a6c919b61d · stream kernel · Updated 2024-03-07 17:01:00 +08:00

5
11

e33131ce5c · fix comment · Updated 2024-01-17 10:57:44 +08:00

21
10

3b5dd7d28c · Merge branch 'master' into update_pybind11 · Updated 2024-01-05 09:20:33 +08:00

23
2

dc6befb549 · fix: fix re-dataMalloc for weight tensor and use of naive allocator · Updated 2023-12-29 17:27:36 +08:00

31
39

a68ac10107 · Enrich dev doc · Updated 2023-12-05 17:14:28 +08:00

34
3

54f4265296 · modified logic · Updated 2023-11-17 17:43:52 +08:00

66
12

965df4e294 · [feature] add fused attention_kvcache operator support (#179) · Updated 2023-11-14 23:44:22 +08:00

40
0
Included

0a5d273130 · Add: print derivation steps for conv2gemm · Updated 2023-11-10 23:16:44 +08:00

41
1

295450e5f4 · Add: show conv2gemm derivation · Updated 2023-11-10 22:49:07 +08:00

95
96

9272d709da · add a simple mempory pool for allocator · Updated 2023-10-19 12:36:01 +08:00

51
1

ee6dd3deac · update test · Updated 2023-10-12 14:07:23 +08:00

55
5

7c484d72b4 · Merge branch 'master' into change_path · Updated 2023-10-12 09:16:12 +08:00

55
2

6319e12c75 · merge master · Updated 2023-10-10 17:04:09 +08:00

56
33

de392beb87 · build: 更新 RefactorGraph · Updated 2023-10-06 15:57:51 +08:00

63
85