InfiniTensor/src/ffi
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
..
ffi_embed.cc Add TVM codegen for MemboundOp (#35) 2022-09-22 18:06:45 +08:00
ffi_infinitensor.cc Accelerate llama (#219) 2024-04-01 08:46:05 +08:00