InfiniTensor/pyinfinitensor
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
..
docs feat: 创建 pyinfinitensor 前端 2023-02-13 09:19:05 +08:00
src/pyinfinitensor Accelerate llama (#219) 2024-04-01 08:46:05 +08:00
tests fix mlu some kernel registration & gather op (#210) 2024-02-01 15:02:02 +08:00
pyproject.toml Change onnx-simplifier to onnxsim to resolve build issue on xpu (#164) 2023-10-21 02:58:32 +08:00