.. |
test_cuda_G2BMM.cc
|
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
|
2023-03-27 21:28:49 +08:00 |
test_cuda_GBMM.cc
|
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
|
2023-03-27 21:28:49 +08:00 |
test_cuda_all_gather.cc
|
impl distributed launch with NCCL (#106)
|
2023-09-05 09:47:35 +08:00 |
test_cuda_all_reduce.cc
|
impl distributed launch with NCCL (#106)
|
2023-09-05 09:47:35 +08:00 |
test_cuda_attention.cc
|
use workspace to optimize kvcache attention
|
2024-01-25 10:33:01 +08:00 |
test_cuda_batch_norm.cc
|
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
|
2023-03-27 21:28:49 +08:00 |
test_cuda_broadcast.cc
|
impl distributed launch with NCCL (#106)
|
2023-09-05 09:47:35 +08:00 |
test_cuda_clip.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_concat.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_cuda_conv.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_conv_fp16.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_conv_transposed_2d.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_cuda_element_wise.cc
|
add CUDNN impl for Min and Max (#118)
|
2023-08-22 16:19:29 +08:00 |
test_cuda_expand.cc
|
框架支持bert/gpt2模型构图 (#94)
|
2023-08-29 16:06:52 +08:00 |
test_cuda_extend.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_gather.cc
|
框架支持bert/gpt2模型构图 (#94)
|
2023-08-29 16:06:52 +08:00 |
test_cuda_gather_elements.cc
|
Add GatherElements op and cuda kernel (#149)
|
2023-10-12 09:18:12 +08:00 |
test_cuda_inception.cc
|
Pooling ceil mode (#155)
|
2023-10-09 20:51:39 +08:00 |
test_cuda_layernorm.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_cuda_matmul.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_pad.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_pooling.cc
|
Pooling ceil mode (#155)
|
2023-10-09 20:51:39 +08:00 |
test_cuda_reduce.cc
|
Add ReduceSum op and kernel (#160)
|
2023-11-24 09:29:58 +08:00 |
test_cuda_reshape.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_resize.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_rope.cc
|
add test for rotary embedding cuda kernel
|
2024-02-04 10:24:20 +08:00 |
test_cuda_sendrecv.cc
|
Add send and recv operators based on NCCL (#182)
|
2023-12-14 16:38:03 +08:00 |
test_cuda_slice.cc
|
memory_allocator (#103)
|
2023-08-13 13:39:35 +08:00 |
test_cuda_softmax.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_cuda_split.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_cuda_transpose.cc
|
Add cuda transpose kernel (#115)
|
2023-08-22 14:22:15 +08:00 |
test_cuda_unary.cc
|
add unittest of silu kernel
|
2024-01-30 10:40:13 +08:00 |
test_cuda_where.cc
|
Modify kernel registration & support fp16 (#205)
|
2024-01-15 11:02:13 +08:00 |
test_perfengine.cc
|
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
|
2023-03-27 21:28:49 +08:00 |