History

PanZezhong1725 7f6aec6c17 针对bert和gpt2模型分布式推理的优化 (#221 ) * fix(dist): 改善分布式脚本，只打印绝对误差 * feat(dist): 增加可导出onnx的pytorch运行脚本 * feat(front): 增加对Y值为-inf的where算子的图优化 * feat(kernel): 对b为常数的pow和div算子进行特判优化 * fix(front): 消除前端对global output形状信息的依赖，分布式脚本删除不必要的shape infer * feat(kernel): 针对matmul中bias为行向量时的expand操作的特化优化 * fix(kernel): 删除div pow const中不必要的同步 * Update expand.cu * fix: fix comments --------- Co-authored-by: Haojie Wang <haojie0429@gmail.com> Co-authored-by: Derui Yang <ydrml@hotmail.com>		2024-04-01 14:04:28 +08:00
..
README.md	针对bert和gpt2模型分布式推理的优化 (#221 )	2024-04-01 14:04:28 +08:00
bang_launch.py	Bang cncl (#163 )	2024-01-03 13:28:03 +08:00
cuda_launch.py	针对bert和gpt2模型分布式推理的优化 (#221 )	2024-04-01 14:04:28 +08:00
launch_kunlun.py	XCCL support (#171 )	2024-02-29 11:48:35 +08:00
launch_kvcache.py	Support kvcache (#134 )	2023-09-18 14:17:02 +08:00
parallel.py	impl distributed launch with NCCL (#106 )	2023-09-05 09:47:35 +08:00
parallel_opt.py	针对bert和gpt2模型分布式推理的优化 (#221 )	2024-04-01 14:04:28 +08:00
placement.py	tensor parallel for transformer (#125 )	2023-09-14 14:19:45 +08:00
run_pytorch.py	针对bert和gpt2模型分布式推理的优化 (#221 )	2024-04-01 14:04:28 +08:00

分布式脚本

使用 --export_onnx 设置导出onnx的目录，默认为当前路径 ./，不使用这个flag则只进行计算和生成输入输出。

python run_pytorch.py --model gpt2  --batch_size 1  --length 1 --export_onnx ./

会在当前目录下生成输入输出文件test_inputs.npy 和 test_results.npy，目前只支持单一输入输出。

python cuda_launch.py --model "/XXX/XXX.onnx" --nproc_per_node 4