forked from jiuyuan/InfiniTensor
a98573990b
* [feature] add cudagraph support * modify code to pass the cuda_all_reduce test * modify rope op * support rmsnorm * add fp16 support to silu cuda op * fix bugs in rmsnorm op * uncomment simplify in onnx.py --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com> |
||
---|---|---|
.. | ||
bang | ||
core | ||
cuda | ||
ffi | ||
intelcpu | ||
kunlun | ||
nnet | ||
operators | ||
utils | ||
test.h |