forked from jiuyuan/InfiniTensor
![]() * [feature] add cudagraph support * modify code to pass the cuda_all_reduce test * modify rope op * support rmsnorm * add fp16 support to silu cuda op * fix bugs in rmsnorm op * uncomment simplify in onnx.py --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com> |
||
---|---|---|
.. | ||
blob.cc | ||
common.cc | ||
data_type.cc | ||
dummy_mutator.cc | ||
graph.cc | ||
graph_handler.cc | ||
graph_match.cc | ||
lazy_allocator.cc | ||
op_type.cc | ||
operator.cc | ||
perf_engine.cc | ||
runtime.cc | ||
search_engine.cc | ||
tensor.cc | ||
tensor_base.cc |