InfiniTensor/src/core
xiaonans a98573990b
Accelerate llama (#219)
* [feature] add cudagraph support

* modify code to pass the cuda_all_reduce test

* modify rope op

* support rmsnorm

* add fp16 support to silu cuda op

* fix bugs in rmsnorm op

* uncomment simplify in onnx.py

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2024-04-01 08:46:05 +08:00
..
blob.cc memory_allocator (#103) 2023-08-13 13:39:35 +08:00
common.cc add code for backtrace (#21) 2022-09-01 20:30:12 +08:00
data_type.cc 框架支持bert/gpt2模型构图 (#94) 2023-08-29 16:06:52 +08:00
dummy_mutator.cc refactor(core): 添加新的 `OpType` 定义 (#99) 2023-08-07 11:17:05 +08:00
graph.cc Modify kernel registration & support fp16 (#205) 2024-01-15 11:02:13 +08:00
graph_handler.cc Accelerate llama (#219) 2024-04-01 08:46:05 +08:00
graph_match.cc ADD: sub graph replacement. (#56) 2023-04-17 13:09:07 +08:00
lazy_allocator.cc support Dynamic tensor infer shape and fix memory pool (#176) 2023-11-23 13:11:50 +08:00
op_type.cc 【Hackathon No.108】Add Gelu operator, ffi, kernel for cpu and gpu. (#148) 2023-10-10 15:21:13 +08:00
operator.cc use workspace to optimize kvcache attention 2024-01-25 10:33:01 +08:00
perf_engine.cc 支持fp16 dtype (#96) 2023-08-02 16:38:16 +08:00
runtime.cc Modify kernel registration & support fp16 (#205) 2024-01-15 11:02:13 +08:00
search_engine.cc refactor(core): 添加新的 `OpType` 定义 (#99) 2023-08-07 11:17:05 +08:00
tensor.cc XCCL support (#171) 2024-02-29 11:48:35 +08:00
tensor_base.cc refactor: 整合操作张量数据的方法 2023-03-21 14:00:04 +08:00