Commit Graph

  • ff98241db7 modified test_cuda_conv_transposed xgqdut2016 2023-12-14 14:44:28 +0800
  • 046b2d68d8 style:fix style OdinaryWord 2023-12-14 13:35:08 +0800
  • 2af4c1276b feat:support int8 for gather OdinaryWord 2023-12-14 13:28:41 +0800
  • db8c3eec15 style: fix style OdinaryWord 2023-12-14 11:32:07 +0800
  • c29dcf1e6d add cuda cast & support half-precision for gather OdinaryWord 2023-12-14 11:24:25 +0800
  • 5ed7db1506 feat: support powOp int8 zhangyunze 2023-12-14 11:15:28 +0800
  • bdb8d8d65f feat: support matmulOp/expandOp fp16 zhangyunze 2023-12-14 11:01:46 +0800
  • cbdeb73e86 - feat: support reduceOp fp16 kilinchange 2023-12-13 17:39:39 +0800
  • 5af7f1e753 - unary support fp16 kilinchange 2023-12-13 17:05:17 +0800
  • ee4ecd27e2 feat: support sliceOp fp16 zhangyunze 2023-12-13 16:55:16 +0800
  • d5e775397d feat: support transpose fp16 zhangyunze 2023-12-13 16:36:20 +0800
  • 4b02de7e17 - element_wise support fp16 kilinchange 2023-12-12 15:15:05 +0800
  • bef4731811 Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into xpu_xccl zhangyue 2023-12-13 14:19:32 +0800
  • e07516ebe9
    Merge branch 'master' into support_fp16 xgqdut2016 2023-12-11 16:50:02 +0800
  • dd4a90fb5e add split_concat fp16 xgqdut2016 2023-12-11 16:45:16 +0800
  • fda0a5f982 add layernorm fp16 xgqdut2016 2023-12-11 15:05:34 +0800
  • c143eebdf7
    不依赖 onnx models 的模型存储 (#196) Derui Yang 2023-12-11 10:44:06 +0800
  • 8b2e3b8e19 add where fp16 xgqdut2016 2023-12-08 16:57:49 +0800
  • a000cb0304 modified all register kernel xgqdut2016 2023-12-07 17:53:28 +0800
  • 15c1e7519f log internal input and output zhangyue 2023-12-07 15:31:43 +0800
  • c587901586 - cpu kernel: adapt the new registration mechanism kilinchange 2023-12-05 10:49:28 +0800
  • a68ac10107 Enrich dev doc add_paddle_model learner2468 2023-12-05 17:14:28 +0800
  • 6d62350631 Change function name and add dev doc learner2468 2023-12-05 17:10:46 +0800
  • 57954fd523 Add paddle model and use InfiniTensor to infer learner2468 2023-12-05 16:53:28 +0800
  • c19256bca6 - support fp16 for conv kilinchange 2023-11-30 14:36:43 +0800
  • 61f6954c99 [feat] add cudagraph support xiaonans 2023-12-01 15:38:01 +0800
  • 4db6699e09 - Remove dataType from the kernel registration. kilinchange 2023-11-30 13:51:24 +0800
  • cf4d7b1639 fix softmax zhangyue 2023-11-30 10:39:34 +0800
  • 8fca12456d Merge branch 'master' of github.com:InfiniTensor/InfiniTensor into xpu_xccl zhangyue 2023-11-29 14:28:07 +0800
  • 815d0ebf44 cleaning xiaonans 2023-11-28 16:29:48 +0800
  • 2fb1c8cf32 gemv2N to gemv2T xiaonans 2023-11-27 16:19:01 +0800
  • 67974aee8a
    Fix https://github.com/InfiniTensor/InfiniTensor/pull/160 (#185) test-models cuda-cast Hardy 2023-11-27 14:18:12 +0800
  • 3ead20a23a
    Fix workspace & bang conv (#183) Hardy 2023-11-24 15:16:25 +0800
  • a7293c12ba
    Add layer normalization (#181) xgqdut2016 2023-11-24 15:15:14 +0800
  • 86877509c1 remove cudamalloc in attention op xiaonans 2023-11-24 13:14:06 +0800
  • 6ece3f4a77
    Add ReduceSum op and kernel (#160) PanZezhong1725 2023-11-24 09:29:58 +0800
  • 595a9906d2
    add infer index function (#175) xgqdut2016 2023-11-24 09:24:25 +0800
  • 331f7ab2b8
    support Dynamic tensor infer shape and fix memory pool (#176) zhangyunze 2023-11-23 13:11:50 +0800
  • 0adac91385 kvcache_attention support reduce intra blocks xiaonans 2023-11-21 17:30:21 +0800
  • 54f4265296 modified logic cuda-attention xgqdut2016 2023-11-17 17:43:52 +0800
  • 965df4e294
    [feature] add fused attention_kvcache operator support (#179) point2point xiaonans 2023-11-14 23:44:22 +0800
  • 0cd30252c4 Merge branch 'xpu_xccl' of github.com:InfiniTensor/InfiniTensor into xpu_xccl zhangyue 2023-11-14 17:03:42 +0800
  • 7896e0a647 fix softmax zhangyue 2023-11-14 16:57:22 +0800
  • 269e4ea40c add test to attention_kvcache op xiaonans 2023-11-14 10:21:34 +0800
  • 0f8858ca75 Merge branch 'master' into xpu_xccl zhangyue 2023-11-13 13:58:25 +0800
  • 0a5d273130 Add: print derivation steps for conv2gemm NNET_231111_from_master wanghailu0717 2023-11-10 23:16:10 +0800
  • 295450e5f4 Add: show conv2gemm derivation NNET_231111 Liyan Zheng 2023-11-10 22:49:07 +0800
  • f22fa2766e
    add reduce_mean and gather on bang (#167) Hardy 2023-11-10 18:02:44 +0800
  • 50862df765
    [Kunlun & CUDA & BANG] add depth2space operator (#178) Hardy 2023-11-10 17:58:26 +0800
  • 1ea450882b
    add reduce_mean and gather on kunlun (#169) Hardy 2023-11-10 17:52:09 +0800
  • 2436ccb868 [feature] add fused attention_kvcache operator support xiaonans 2023-11-10 10:51:44 +0800
  • 1bc674036a Merge branch 'master' into xpu_xccl zhangyue 2023-11-08 13:57:16 +0800
  • d3e7543291
    Cuda softmax (#129) xgqdut2016 2023-11-06 08:56:23 +0800
  • 39484e0cc4 add kernels OdinaryWord 2023-11-03 14:43:21 +0800
  • 1a6fccccbe
    test: 支持编译 einnet 单元测试,但不是所有测试都能通过 (#174) Derui Yang 2023-11-03 13:21:49 +0800
  • 025b171aed Merge branch 'fix_kunlun_platform' into xpu_xccl wanghailu 2023-11-01 13:35:36 +0800
  • 5a50cbf900 fix xpu, add where operation, fix element-wise operation wanghailu 2023-11-01 09:56:21 +0800
  • 4d643f8178 Merge branch 'master' into fix_kunlun_platform wanghailu 2023-10-31 14:04:11 +0800
  • 786ead08cd fix wanghailu 2023-10-31 14:03:10 +0800
  • ec3adf6fa7
    support 8D tensor, add test example (#170) xgqdut2016 2023-10-31 10:47:36 +0800
  • c7305f73a6
    MERGE xpu_allgather to xpu_xccl (#173) zhangyue 2023-10-31 09:55:20 +0800
  • 97283d8de3 fix format zhangyue 2023-10-30 17:26:14 +0800
  • 9112ec5a2f add broadcast zhangyue 2023-10-30 17:24:59 +0800
  • da5863ce49 Merge branch 'master' into fix_kunlun_platform wanghailu 2023-10-30 17:05:23 +0800
  • 7e212a9e66 fix gather wanghailu 2023-10-30 17:03:33 +0800
  • ca368298fa fix format zhangyue 2023-10-30 16:05:06 +0800
  • c35715a451 delete specific compiler zhangyue 2023-10-30 16:03:30 +0800
  • 23b825efc4
    Xpu task4 support: add softmax (#172) Bolun Zhang 2023-10-30 16:01:05 +0800
  • 140fad244d Merge branch 'xpu_xccl' of github.com:InfiniTensor/InfiniTensor into xpu_allgather zhangyue 2023-10-30 15:57:01 +0800
  • 286425edc9 add xpu allgather zhangyue 2023-10-30 15:56:53 +0800
  • 21c3fa6735 delete xpu_wait() zhangyue 2023-10-30 15:19:05 +0800
  • 9293c09c22 add xpu allgather zhangyue 2023-10-30 15:04:50 +0800
  • feccd4f318
    fix tensor parallel for llama (#159) constroy Li 2023-10-30 15:04:16 +0800
  • a9bd73528d more Unary OdinaryWord 2023-10-30 11:24:53 +0800
  • eee503c65d add DIST option in Makefile zhangyue 2023-10-30 10:51:21 +0800
  • 0703958f4f fix makefile zhangyue 2023-10-30 10:32:51 +0800
  • f34d30136e fix format zhangyue 2023-10-30 10:31:03 +0800
  • 91a89649bf deltete cmake opt zhangyue 2023-10-30 09:48:38 +0800
  • 40a2d35783 add kunlun allreduce and cmakefile zhangyue 2023-10-27 15:53:54 +0800
  • 2dcba7e26a add kunlun allreduce and cmakefile zhangyue 2023-10-27 15:52:57 +0800
  • 95ee579338 addAbs OdinaryWord 2023-10-26 16:37:03 +0800
  • 11e2b08be3 fix wanghailu0717 2023-10-26 10:18:15 +0800
  • cc057bcf80 fix wanghailu0717 2023-10-26 10:08:58 +0800
  • 36360d6a69 fix format wanghailu 2023-10-25 15:08:19 +0800
  • 7f5188bedd
    remove dimension limit of elementwise operators on xpu (#168) Haojie Wang 2023-10-25 14:38:47 +0800
  • e3b86ee298 add reduce_mean and gather wanghailu 2023-10-25 13:59:53 +0800
  • 6b06ab0534 fix format wanghailu 2023-10-23 14:54:10 +0800
  • 412f301323 fix format wanghailu 2023-10-23 10:48:35 +0800
  • 07ef587c65
    Change onnx-simplifier to onnxsim to resolve build issue on xpu (#164) baominghelly 2023-10-21 02:58:32 +0800
  • b1bdbbf478
    Merge branch 'master' into ascend Haojie Wang 2023-10-21 02:57:51 +0800
  • 56634b3b19 fix code wanghailu 2023-10-20 15:06:08 +0800
  • b6ff4514fe fix wanghailu0717 2023-10-20 14:08:39 +0800
  • 9272d709da add a simple mempory pool for allocator allocator_memPool kilinchange 2023-10-17 17:22:43 +0800
  • d0f9792613
    Fix: add building option for NNet (#162) Derui Yang 2023-10-16 19:53:28 +0800
  • 1184fa131f
    Xpu (#82) Hardy 2023-10-16 10:57:08 +0800
  • ee6dd3deac update test test_codegen Bolun 2023-10-12 14:07:23 +0800
  • 8e4d88fb9f
    add transpose, concat and split for native cpu (#158) Haojie Wang 2023-10-12 10:14:28 +0800
  • c774c9182d fix Bolun 2023-10-12 09:59:08 +0800
  • 36ae7b7fb6
    Add GatherElements op and cuda kernel (#149) PanZezhong1725 2023-10-12 09:18:12 +0800
  • 7c484d72b4
    Merge branch 'master' into change_path change_path Haojie Wang 2023-10-12 09:16:12 +0800