Commit Graph

  • 80cd1c951e fast float4 cuda-attention xgqdut2016 2024-05-29 15:21:54 +0800
  • bc2440cb98 format ascend zhangyue 2024-05-28 16:03:42 +0800
  • 7b04957699
    Update env.sh zhangyue 2024-05-28 15:27:09 +0800
  • abccff0829
    Update INSTALL_GUIDE_CN.md for ASCEND Songxin 2024-05-28 11:16:51 +0800
  • faed0682fc remove sync in op OdinaryWord 2024-05-24 14:02:55 +0800
  • c8d60d76e8 Merge branch 'ascend' of github.com:InfiniTensor/InfiniTensor into ascend OdinaryWord 2024-05-23 14:59:22 +0800
  • a1a68d3624 fix resize OdinaryWord 2024-05-23 14:56:16 +0800
  • a66ff430ec one thread deal with muti data xgqdut2016 2024-05-16 15:35:15 +0800
  • a8443741c4 fix op OdinaryWord 2024-05-15 22:29:52 +0800
  • 9b6c44dd40 warp reduce xgqdut2016 2024-05-11 16:53:02 +0800
  • a889527aa5 add kunlun layernorm dev-leakyrelu zhangyue 2024-05-11 16:24:42 +0800
  • 131d1cb6d0 sub matrix xgqdut2016 2024-05-10 16:42:30 +0800
  • 20f651b1d3 implement instance norm in front instance_norm crapromer 2024-05-08 17:44:54 +0800
  • 3001274969 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
  • 5747eb8f7d modified format xgqdut2016 2024-05-07 16:31:53 +0800
  • 9384cec7de add pad2d kernel xgqdut2016 2024-05-07 16:22:29 +0800
  • 2acb680c64 fix: format Zhang Bolun 2024-05-07 09:42:04 +0800
  • 5862671c0c fix: add comments Zhang Bolun 2024-05-06 17:01:51 +0800
  • 917e82e90c feat: 寒武纪上添加 resize 算子,修复 format Zhang Bolun 2024-05-06 16:45:01 +0800
  • 7146294baa memcopy instead of special kernel cuda-transpose xgqdut2016 2024-05-06 14:49:39 +0800
  • f0509facc6
    Merge branch 'master' into ascend Haojie Wang 2024-05-06 10:25:20 +0800
  • 6ad05da684 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
  • 6a89946736 modiefied format, replace layernorm as instancenorm xgqdut2016 2024-04-30 15:04:12 +0800
  • 0fcaf001c4 add instancenorm, use layernorm replace instance, error xgqdut2016 2024-04-30 14:56:08 +0800
  • 377b3bf391 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
  • d1799b67a3 fix: onnx resize op input is none bug zhangyunze 2024-04-30 10:53:40 +0800
  • 36baae7615 feat: kunlun 上添加LeakyRelu,修复BatchNorm中维度为4的限制,跑通bgan weijie01 2024-04-28 10:42:04 +0800
  • 23b1612192 fix: mlu 上添加 LeakyRelu,修复 BatchNorm 中维度为 4 的限制,跑通 BGAN Zhang Bolun 2024-04-25 17:05:06 +0800
  • 77fd137dcb fix: support batchnorm cudnn 2 dimension input zhangyunze 2024-04-25 16:40:15 +0800
  • c6de91ee82 feat: support leaky_relu op zhangyunze 2024-04-25 11:36:08 +0800
  • 907239cf34 fix gemm & avgpooling OdinaryWord 2024-04-29 16:10:32 +0800
  • 47fc0bfa99 modified batchnorm xgqdut2016 2024-04-28 16:34:03 +0800
  • ef4646ec89 modified onnx leakyrelu alpha xgqdut2016 2024-04-28 16:03:14 +0800
  • e6b98fd652 modified format xgqdut2016 2024-04-28 15:02:14 +0800
  • 4d078967e0 add leakyRelu op xgqdut2016 2024-04-28 14:53:05 +0800
  • 5559536470
    add kunlun squeeze kernel (#229) master zhangyue 2024-04-28 11:28:28 +0800
  • fac28c25f6
    添加 MLU 平台分布式验收脚本 (#223) Bolun Zhang 2024-04-28 11:24:09 +0800
  • 0c94b75a65 add gemm OdinaryWord 2024-04-26 16:59:39 +0800
  • 775ce5040d format OdinaryWord 2024-04-26 16:01:59 +0800
  • 6ba1a0648a add layernorm OdinaryWord 2024-04-26 15:25:41 +0800
  • a765cd2a3d Merge branch 'ascend' of github.com:InfiniTensor/InfiniTensor into ascend OdinaryWord 2024-04-25 17:28:18 +0800
  • 8b8f165158 add depthTospace&&resize OdinaryWord 2024-04-25 17:24:33 +0800
  • 985d0dee5f
    Kunlun dist op (#225) zhangyue 2024-04-23 15:46:25 +0800
  • b0d030d0de [fix] fix rope op test failing kvcache_backup xiaonans 2024-04-12 17:22:24 +0800
  • d000f9750c add shape information to the kvcache attention operator xiaonans 2024-04-11 14:52:39 +0800
  • 5b89c699dc style: fix format OdinaryWord 2024-04-10 17:36:23 +0800
  • 2b8823515e style: fix format OdinaryWord 2024-04-10 17:23:13 +0800
  • 87f975d969 ascend commit 0410 OdinaryWord 2024-04-10 16:47:31 +0800
  • 4a5b9572bb add test scripts for llama2 and 9G models kvcache_attention_fp16 xiaonans 2024-04-10 16:23:02 +0800
  • 33e1521754 fix OdinaryWord 2024-04-10 15:40:30 +0800
  • ec549d260b add communication operator OdinaryWord 2024-04-10 15:13:15 +0800
  • 73e3f1fc6f add currency operator xgqdut2016 2024-04-10 15:01:22 +0800
  • 86133c8d0a modified expand xgqdut2016 2024-04-10 11:16:54 +0800
  • 2761d46737 modified div_kernel xgqdut2016 2024-04-10 10:51:35 +0800
  • aa1c3222ed modified transpose and where xgqdut2016 2024-04-10 10:17:45 +0800
  • 159642d6ae merge master xiaonans 2024-04-10 10:03:11 +0800
  • c01e64db50 rope and attention ops support multiple batchs/sequences. xiaonans 2024-04-08 12:01:08 +0800
  • 3b7b5740af allocate workspace from allocator for kunlun runtime kunlun_temp kilinchange 2024-04-08 15:48:06 +0800
  • 6c4dd7b28b fix(front): 将stub改为可以接收GraphProto作为输入,消除分布式脚本保存额外的onnx文件, 采用int64作为index输入类型 dist/graph panzezhong 2024-04-07 17:15:40 +0800
  • d1de3ab5c2
    feat(dist):分布式脚本支持混合精度 (#226) PanZezhong1725 2024-04-07 16:57:07 +0800
  • e4387904c2
    Merge branch 'master' into kunlun_dist_op Haojie Wang 2024-04-03 09:58:21 +0800
  • eafbff6cf9
    Support kunlun new toolkit (#224) Hardy 2024-04-03 09:56:52 +0800
  • 14a40a1967 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into kunlun_dist_op wanghailu 2024-04-03 01:01:40 +0800
  • 32a13b7760 kunlun distributed weijie01 2024-04-02 17:15:08 +0800
  • dddb40cd93 add conv_transpose OdinaryWord 2024-04-02 16:38:40 +0800
  • a71cd14963 kunlun dist inference fix weijie01 2024-04-02 15:30:46 +0800
  • a5ccf06551 add conv_transpose&&native maxpooling OdinaryWord 2024-04-01 16:01:36 +0800
  • 7f6aec6c17
    针对bert和gpt2模型分布式推理的优化 (#221) PanZezhong1725 2024-04-01 14:04:28 +0800
  • a98573990b
    Accelerate llama (#219) xiaonans 2024-04-01 08:46:05 +0800
  • eb3a2d123d accelerate cuda attention xiaonans 2024-03-28 09:07:30 +0800
  • 4bdd33522b accelerate cuda fp32 matmul xiaonans 2024-03-26 11:37:54 +0800
  • 54a35772fb
    feature: add parameter to config matmul compute type (#218) Chenjie Duan 2024-03-26 09:00:45 +0800
  • 25a3cedeb0 add pytorch bench dist_bench Bolun 2024-03-21 02:27:32 +0000
  • 0740d26f43 clean up xiaonans 2024-03-21 10:17:06 +0800
  • fc3d38f80e attention support fp16 xiaonans 2024-03-20 14:56:15 +0800
  • d43364ac60 inter-block communication is fp16 xiaonans 2024-03-19 11:21:14 +0800
  • db053e32a4 kv register is fp16 xiaonans 2024-03-18 17:25:57 +0800
  • 1e797d4ffe cache is fp16 xiaonans 2024-03-18 15:51:19 +0800
  • 80412ae162 fix bugs when blocksize==64 xiaonans 2024-03-18 15:31:52 +0800
  • fc4b62a88c add maxpooling & flatten OdinaryWord 2024-03-13 17:25:15 +0800
  • a6c919b61d stream kernel bang-softmax xgqdut2016 2024-03-07 09:01:00 +0000
  • d4721cb40c modified the memory allocattion xgqdut2016 2024-03-06 02:48:45 +0000
  • 6ace4d8ae2 modified test cnnl and bang time xgqdut2016 2024-03-05 09:00:49 +0000
  • 9aaf313c6f modified test_bang_softmax.cc xgqdut2016 2024-02-29 07:19:36 +0000
  • 36e0840f2f support for llama OdinaryWord 2024-02-29 14:29:28 +0800
  • 00e6cc2587
    XCCL support (#171) zhangyue 2024-02-29 11:48:35 +0800
  • 1ed4b36db2 add bangSoftmax , compare cnnl and bang C xgqdut2016 2024-02-28 03:14:58 +0000
  • 186a6f37f2 modified the recip xgqdut2016 2024-02-27 08:59:01 +0000
  • 920b23cad8 Modify memory allocation method xgqdut2016 2024-02-26 02:13:04 +0000
  • e5d5085e6a modified tensor.h xgqdut2016 2024-02-23 02:19:30 +0000
  • 66d98a3f04 modified kernel, add nDim parameter xgqdut2016 2024-02-22 03:11:58 +0000
  • ddf90fb19e
    Merge branch 'master' into bang-softmax xgqdut2016 2024-02-22 10:36:39 +0800
  • c41ad9120d add bang softmax,output error xgqdut2016 2024-02-22 02:35:20 +0000
  • b51ccae3b2
    fix broken link in docs (#216) baominghelly 2024-02-21 14:03:20 +0800
  • 1c08ba200c
    [feature] add cudagraph support (#215) xiaonans 2024-02-21 14:00:25 +0800
  • 83be7fa373 fix bugs in rmsnorm op xiaonans 2024-02-20 10:50:47 +0800
  • 0f1c04d864 add fp16 support to silu cuda op xiaonans 2024-02-19 11:39:21 +0800
  • 936797b960 support rmsnorm xiaonans 2024-02-08 10:40:03 +0800
  • 17bd98d453 modify rope op xiaonans 2024-02-06 17:04:05 +0800
  • 8cc6af0a83 modify code to pass the cuda_all_reduce test xiaonans 2024-02-06 10:41:23 +0800