forked from jiuyuan/InfiniTensor
a98573990b
* [feature] add cudagraph support * modify code to pass the cuda_all_reduce test * modify rope op * support rmsnorm * add fp16 support to silu cuda op * fix bugs in rmsnorm op * uncomment simplify in onnx.py --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com> |
||
---|---|---|
.. | ||
ffi_embed.cc | ||
ffi_infinitensor.cc |