InfiniTensor

History

constroy Li feccd4f318 fix tensor parallel for llama (#159 ) * fix Slice * change default rounds of timeit to 10 to reduce time * fix slice with large ends * Reshape support Int64 * support position_ids as input * skip last MatMul in Llama * skip infer_shapes to parse large model * update launch.py * fix split_concat_kernel * print more message in launch.py * Reshape supports both Int32 and Int64 * try infer_shapes and warn about failure * fix format --------- Co-authored-by: whjthu <haojie0429@gmail.com>		2023-10-30 15:04:16 +08:00
..
bang	fix bang runtime bug after merging distributed branch (#137 )	2023-09-19 14:10:39 +08:00
core	fix tensor parallel for llama (#159 )	2023-10-30 15:04:16 +08:00
cuda	Add GatherElements op and cuda kernel (#149 )	2023-10-12 09:18:12 +08:00
ffi	Add TVM codegen for MemboundOp (#35 )	2022-09-22 18:06:45 +08:00
intelcpu	Cpu backend2 (#77 )	2023-04-17 12:15:23 +08:00
kunlun	Xpu (#82 )	2023-10-16 10:57:08 +08:00
nnet	Dev for 202303ddl (#66 )	2023-04-18 15:10:33 +08:00
operators	add transpose, concat and split for native cpu (#158 )	2023-10-12 10:14:28 +08:00
utils	tensor parallel for transformer (#125 )	2023-09-14 14:19:45 +08:00
test.h	Add python interface for CUDA operator evaluation (#42 )	2022-09-27 10:41:12 +08:00