InfiniTensor/examples/distributed
constroy Li feccd4f318
fix tensor parallel for llama (#159)
* fix Slice

* change default rounds of timeit to 10 to reduce time

* fix slice with large ends

* Reshape support Int64

* support position_ids as input

* skip last MatMul in Llama

* skip infer_shapes to parse large model

* update launch.py

* fix split_concat_kernel

* print more message in launch.py

* Reshape supports both Int32 and Int64

* try infer_shapes and warn about failure

* fix format

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-10-30 15:04:16 +08:00
..
launch.py fix tensor parallel for llama (#159) 2023-10-30 15:04:16 +08:00
launch_kvcache.py Support kvcache (#134) 2023-09-18 14:17:02 +08:00
parallel.py impl distributed launch with NCCL (#106) 2023-09-05 09:47:35 +08:00
parallel_opt.py fix tensor parallel for llama (#159) 2023-10-30 15:04:16 +08:00
placement.py tensor parallel for transformer (#125) 2023-09-14 14:19:45 +08:00