2024-10-15 14:05:02 +08:00
|
|
|
|
# ModelTest README
|
|
|
|
|
|
|
|
|
|
ModelTest为大模型的性能和精度提供测试功能。
|
|
|
|
|
|
|
|
|
|
目前支持:
|
|
|
|
|
|
|
|
|
|
1. NPU,PA场景,性能/性能摸高测试/精度测试,float16,单机/多机多卡
|
|
|
|
|
2. GPU,FA场景,精度测试,float16
|
|
|
|
|
|
|
|
|
|
功能:
|
|
|
|
|
|
|
|
|
|
1. 性能测试:
|
|
|
|
|
1. 指定batch,指定输入输出长度的e2e性能、吞吐,首Token以及非首Token性能,吞吐。
|
|
|
|
|
2. 性能摸高,指定输入输出长度,指定摸高区间,指定最大非首token时延下的最大batch_size的信息,性能数据以及中间信息。
|
|
|
|
|
2. 精度测试:CEval, BoolQ, HumanEval, HumanEval_X, CMMLU, MMLU, TruthfulQA下游数据集
|
|
|
|
|
|
|
|
|
|
PA模型支持:
|
|
|
|
|
|
|
|
|
|
1. LLaMA (LLaMA-7B, LLaMA-13B, LLaMA-33B,LLaMA-65B, LLaMA2-7B, LLaMA2-13B, LLaMA2-70B, LLaMA3-8B, LLaMA3-70B)
|
|
|
|
|
2. Starcoder (Starcoder-15.5B, Starcoder2-15B)
|
|
|
|
|
3. ChatGLM(ChatGLM2-6B, ChatGLM3-6B, ChatGLM3-6b-32k)
|
|
|
|
|
4. CodeGeeX2-6B
|
|
|
|
|
5. Baichuan2 (Baichuan2-7B, Baichuan2-13B)
|
|
|
|
|
6. Qwen (Qwen-7B,Qwen-14B, Qwen-72B,Qwen1.5-14B,Qwen-14B-chat,Qwen-72B-chat,Qwen1.5-0.5B-chat,Qwen1.5-4B-chat,Qwen1.5-7B,Qwen1.5-14B-chat,Qwen1.5-32B-chat,Qwen1.5-72B)
|
|
|
|
|
7. Aquila (Aquila-7B)
|
|
|
|
|
8. Deepseek (Deepseek16B, Deepseek-LLM-7B, Deepseek-LLM-67B, Deepseek-Coder-1.3B, Deepseek-Coder-6.7B, Deepseek-Coder-7B, Deepseek-Coder-33B)
|
|
|
|
|
9. Mixtral (Mixtral8 * 7B)
|
|
|
|
|
10. Bloom-7B
|
|
|
|
|
11. Baichuan1 (Baichuan1-7B, Baichuan1-13B)
|
|
|
|
|
12. CodeLLaMA (CodeLLaMA-7B, CodeLLaMA-13B, CodeLLaMA-34B, CodeLLaMA-70B)
|
|
|
|
|
13. Yi (Yi-6B-200K, Yi-34B)
|
|
|
|
|
14. Chinese Alpaca (Chinese-Alpaca-13B)
|
|
|
|
|
15. Vicuna (Vicuna-7B, Vicuna-13B)
|
|
|
|
|
16. Internlm(Internlm_20b, Internlm2_7b, Internlm2_20b)
|
|
|
|
|
17. Gemma(Gemma_2b, Gemma-7b)
|
|
|
|
|
18. Mistral(Mistral-7B-Instruct-v0.2)
|
|
|
|
|
19. Ziya(Ziya-Coding-34B)
|
|
|
|
|
20. CodeShell (CodeShell-7B)
|
|
|
|
|
|
|
|
|
|
# 使用说明
|
|
|
|
|
|
|
|
|
|
### 环境变量
|
|
|
|
|
|
|
|
|
|
```shell
|
|
|
|
|
# source cann环境变量
|
|
|
|
|
source /usr/local/Ascend/ascend-toolkit/set_env.sh
|
|
|
|
|
# source 加速库环境变量
|
|
|
|
|
source /usr/local/Ascend/nnal/atb/set_env.sh
|
|
|
|
|
# source 模型仓tar包解压出来后的环境变量
|
|
|
|
|
source set_env.sh
|
|
|
|
|
# 设置使用卡号
|
|
|
|
|
export ASCEND_RT_VISIBLE_DEVICES="[卡号]" # NPU场景,如"0,1,2,3,4,5,6,7"
|
2024-10-22 17:22:38 +08:00
|
|
|
|
export ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
|
2024-10-15 14:05:02 +08:00
|
|
|
|
或
|
|
|
|
|
export CUDA_VISIBLE_DEVICES="[卡号]" # GPU场景,如"0,1,2,3,4,5,6,7"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 安装python依赖
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
pip install -r requirements.txt
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 运行指令
|
|
|
|
|
```
|
|
|
|
|
统一说明:
|
|
|
|
|
1. model_name:
|
|
|
|
|
LLaMA-7B, LLaMA-13B, LLaMA-33B,LLaMA-65B, LLaMA2-7B, LLaMA2-13B, LLaMA2-70B, LLaMA3-8B, LLaMA3-70B: llama
|
|
|
|
|
CodeLLaMA-7B, CodeLLaMA-13B, CodeLLaMA-34B, CodeLLaMA-70B: codellama
|
|
|
|
|
Chinese-Alpaca-13B: llama
|
|
|
|
|
Yi-6B-200K, Yi-34B: yi
|
|
|
|
|
Vicuna-7B, Vicuna-13B: vicuna
|
|
|
|
|
Starcoder-15.5B: starcoder
|
|
|
|
|
Starcoder2-15B: starcoder2
|
|
|
|
|
ChatGLM2-6B: chatglm2_6b
|
|
|
|
|
ChatGLM3-6B, ChatGLM3-6B-32k: chatglm3_6b
|
|
|
|
|
CodeGeeX2-6B: codegeex2_6b
|
|
|
|
|
Baichuan2-7B: baichuan2_7b
|
|
|
|
|
Baichuan2-13B: baichuan2_13b
|
|
|
|
|
Internlm-20B, Internlm2-7B, Internlm2-20B: internlm
|
|
|
|
|
Qwen-7B,Qwen-14B, Qwen-72B,Qwen1.5-14B,Qwen-14B-chat,Qwen-72B-chat,Qwen1.5-0.5B-chat,Qwen1.5-4B-chat,Qwen1.5-7B,Qwen1.5-14B-chat,Qwen1.5-32B-chat,Qwen1.5-72B: qwen
|
|
|
|
|
Aquila-7B: aquila_7b
|
|
|
|
|
Deepseek16B: deepseek
|
|
|
|
|
Deepseek-LLM-7B, Deepseek-LLM-67B: deepseek_llm
|
|
|
|
|
Deepseek-Coder-1.3B, Deepseek-Coder-6.7B, Deepseek-Coder-7B, Deepseek-Coder-33B: deepseek_coder
|
|
|
|
|
Mixtral8 * 7B: mixtral
|
|
|
|
|
Mistral-7B-Instruct-v0.2: mistral
|
|
|
|
|
BLOOM-7B1, BLOOM-176B: bloom
|
|
|
|
|
Baichuan1-7B: baichuan2_7b
|
|
|
|
|
Baichuan1-13B: baichuan2_13b
|
|
|
|
|
Gemma-2B,Gemma-7B:gemma
|
|
|
|
|
Ziya-Coding-34B:ziya
|
|
|
|
|
CodeShell-7B: codeshell_7b
|
|
|
|
|
2. is_chat_model: 是否使用模型的chat版本,传入chat为使用chat版本模型,传入base或不传入为使用base版本模型
|
|
|
|
|
3. weight_dir: 权重路径
|
|
|
|
|
4. chip_num: 使用的卡数
|
|
|
|
|
5. max_position_embedding: 可选参数,不传入则使用config中的默认配置
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### 性能测试(指定batch_size)
|
|
|
|
|
```
|
|
|
|
|
# NPU
|
|
|
|
|
|
|
|
|
|
## 单机场景
|
|
|
|
|
bash run.sh pa_fp16 performance [case_pair] [batch_size] [model_name] ([is_chat_model]) [weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
2024-10-22 17:22:38 +08:00
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 qwen /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/qwen/Qwen-7B 8
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 qwen /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/qwen/Qwen-7B 1
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 llama /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/modelscope/llama-2-7b-ms 8
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 llama /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/modelscope/llama-2-7b-ms 1
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 chatglm2_6b /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/zhipuai/chatglm2-6b 8
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 chatglm2_6b /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/zhipuai/chatglm2-6b 1
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 baichuan2_7b /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/baichuan-inc/Baichuan2-7B-Base 8
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 baichuan2_7b /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/baichuan-inc/Baichuan2-7B-Base 1
|
|
|
|
|
|
|
|
|
|
|
2024-10-15 14:05:02 +08:00
|
|
|
|
##多机场景
|
|
|
|
|
bash run.sh pa_fp16 performance [case_pair] [batch_size] [model_name] ([is_chat_model]) [weight_dir] [rank_table_file] [world_size] [node num] [rank_id_start] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
|
|
|
|
或
|
|
|
|
|
|
|
|
|
|
# GPU
|
|
|
|
|
不支持
|
|
|
|
|
|
|
|
|
|
说明:
|
|
|
|
|
1. case_pair接收一组或多组输入,格式为[[seq_in_1,seq_out_1],...,[seq_in_n,seq_out_n]], 中间不接受空格,如[[256,256],[512,512]];值得关注的是,当输入多组时,应按照seq_in + seq_out从大到小的顺序排列, 如[[2048,2048],[1024,1024],[512,512],[256,256]], 否则可能会导致测试性能不准确。
|
|
|
|
|
2. batch_size接收单个,多个或多组输入,其中:
|
|
|
|
|
1. 单个输入:以数字或者[数字]格式输入,如1或[1]
|
|
|
|
|
2. 多个输入:以多个数字逗号隔开或者[多个数字逗号隔开]格式输入,如1,4或[1,4]
|
|
|
|
|
3. 多组输入:输入的组数要求与case_pair相同且一一对应,格式为[[bs1, bs2],...,[bs3,bs4]],如当前case_pair输入为[[256,256],[512,512]], batch_size输入为[[1,4],[1,8]],则对于[256,256]会测试 1,4 batch,对于[512,512]会测试 1,8 batch
|
|
|
|
|
3. 运行完成后,会在控制台末尾呈现保存数据的文件夹
|
|
|
|
|
4. 在多机场景下,
|
|
|
|
|
1. rank_table_file: 路径字符串,为rank_table信息,一般为json文件,如/home/rank_table.json
|
|
|
|
|
2. world_size: 数字,总卡数信息,多机卡数之和,如16
|
|
|
|
|
3. node_num: 数字,节点数量,即多机数量, 如2
|
|
|
|
|
4. rank_id_start: 数字,当前rank_id的起始值,如两台机器16卡的情况下,第一台的rank_id_start为0,第二台的rank_id_start为8
|
|
|
|
|
说明:
|
|
|
|
|
1. 当前默认world_size与node_num的商为每个节点的卡数,即每个节点卡数平均分配
|
|
|
|
|
2. 当前默认每个节点使用的卡数为该节点前world_size / node_num张
|
|
|
|
|
3. world_size,node_num,rank_id_start与rank_table_file信息保持一致
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
#### 性能测试(摸测最大batch_size)
|
|
|
|
|
```
|
|
|
|
|
# NPU
|
|
|
|
|
|
|
|
|
|
## 单机场景
|
|
|
|
|
bash run.sh pa_fp16 performance_maxbs [case_pair] [batch_range] [time_limit] [model_name] ([is_chat_model])[weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
|
|
|
|
## 多机场景
|
|
|
|
|
不支持
|
|
|
|
|
|
|
|
|
|
或
|
|
|
|
|
|
|
|
|
|
# GPU
|
|
|
|
|
不支持
|
|
|
|
|
|
|
|
|
|
说明:
|
|
|
|
|
1. case_pair接收一组或多组输入,格式为[[seq_in_1,seq_out_1],...,[seq_in_n,seq_out_n]], 中间不接受空格,如[[256,256],[512,512]]
|
|
|
|
|
2. batch_range接收一组或多组输入,数量与case_pair的组数一致,表达对应的case_pair会在给定的batch_range中寻找摸测满足time_limit的最大batch_size.
|
|
|
|
|
格式为[[lb1,rb1],...,[lbn,rbn]],其中区间均为闭区间。如[[1,1000],[200,300]]
|
|
|
|
|
3. time_limit:摸测最大bs时的非首token时延最大值。
|
|
|
|
|
4. 结果保存: [...]/tests/modeltest/result/模型名/ 下,
|
|
|
|
|
1. 以"_round_result.csv"结尾的文件内保存了过程数据
|
|
|
|
|
2. 以"_final_result.csv"结尾的文件内保存了最终数据, 会呈现在控制台末尾
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### 精度测试(下游数据集)
|
|
|
|
|
```
|
|
|
|
|
# NPU
|
|
|
|
|
|
|
|
|
|
##单机场景
|
|
|
|
|
bash run.sh pa_fp16 [dataset] ([shots]) [batch_size] [model_name] ([is_chat_model]) [weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
2024-10-22 17:22:38 +08:00
|
|
|
|
bash run.sh pa_fp16 full_MMLU 0 1 qwen /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/qwen/Qwen-7B 8
|
|
|
|
|
|
2024-10-15 14:05:02 +08:00
|
|
|
|
##多机场景
|
|
|
|
|
bash run.sh pa_fp16 [dataset] ([shots]) [batch_size] [model_name] ([is_chat_model]) [weight_dir] [rank_table_file] [world_size] [node num] [rank_id_start] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
|
|
|
|
或
|
|
|
|
|
|
|
|
|
|
# GPU
|
|
|
|
|
bash run.sh fa [dataset] ([shots]) [batch_size] [model_name] ([is_chat_model]) [weight_dir] [chip_num]
|
|
|
|
|
|
|
|
|
|
说明:
|
|
|
|
|
1. dataset接受一个输入,根据需要从以下列表中选择:
|
|
|
|
|
[full_CEval|full_BoolQ|full_HumanEval|full_HumanEval_X|full_MMLU|full_CMMLU|full_TruthfulQA|full_LongBench]
|
|
|
|
|
2. batch_size接收单个或多个输入,其中:
|
|
|
|
|
1. 单个输入:以数字或者[数字]格式输入,如1或[1]
|
|
|
|
|
2. 多个输入:以多个数字逗号隔开或者[多个数字逗号隔开]格式输入,如1,4或[1,4]
|
|
|
|
|
3. shots: 当测试full_CEval,full_MMLU和full_CMMLU时,shot为测试时使用的shot数,如0或5
|
|
|
|
|
4. 运行完成后,会在控制台末尾呈现保存数据的文件夹
|
|
|
|
|
5. 在多机场景下:
|
|
|
|
|
同性能测试(指定batch_size)章节多机场景说明
|
|
|
|
|
6. 在测试full_HumanEval_X时,可参照dataset\full\HumanEval_X\install_humaneval_x_dependency.sh配置依赖环境
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### 单用例测试(性能/精度)
|
|
|
|
|
```
|
|
|
|
|
# NPU
|
|
|
|
|
|
|
|
|
|
## 性能
|
|
|
|
|
bash run.sh pa_fp16 performance_single [case_pair] [input_text_or_file] [batch_size] [model_name] ([is_chat_model])[weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
2024-11-01 13:44:54 +08:00
|
|
|
|
bash /usr/local/Ascend/llm_model/tests/modeltest/run.sh pa_fp16 performance_single [[1024,1024]] ./ 1 qwen /home/user/repo/LLaMA-Factory-310P3/ms_cache/hub/qwen/Qwen-7B 8
|
|
|
|
|
|
|
|
|
|
|
2024-10-15 14:05:02 +08:00
|
|
|
|
## 精度
|
|
|
|
|
bash run.sh pa_fp16 precision_single [case_pair] [input_text_or_file] [batch_size] [model_name] ([is_chat_model])[weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
|
|
|
|
或
|
|
|
|
|
|
|
|
|
|
# GPU
|
|
|
|
|
|
|
|
|
|
## 性能
|
|
|
|
|
不支持
|
|
|
|
|
|
|
|
|
|
## 精度
|
|
|
|
|
bash run.sh fa precision_single [case_pair] [input_text_or_file] [batch_size] [model_name] ([is_chat_model])[weight_dir] [chip_num] ([max_position_embedding/max_sequence_length])
|
|
|
|
|
|
|
|
|
|
说明:
|
|
|
|
|
1. case_pair接收一组或多组输入,格式为[[seq_in_1,seq_out_1],...,[seq_in_n,seq_out_n]], 中间不接受空格,如[[256,256],[512,512]]
|
|
|
|
|
2. input_text_or_file接受一个或多个文本输入或者一个文件输入,如'["hello","hi"]'或input.txt
|
|
|
|
|
3. batch_size接收单个或多个输入,其中:
|
|
|
|
|
1. 单个输入:以数字或者[数字]格式输入,如1或[1]
|
|
|
|
|
2. 多个输入:以多个数字逗号隔开或者[多个数字逗号隔开]格式输入,如1,4或[1,4]
|
|
|
|
|
4. seq_in会根据实际输入决定(性能测试时会影响warmup时分配的空间),batch数会根据实际输入数量决定(如果input_text_or_file中输入数量大于batch_size,那么会以实际输入数量作为batch_size)
|
|
|
|
|
4. 运行完成后,会在控制台末尾呈现保存数据的文件夹
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
举例:
|
|
|
|
|
```
|
|
|
|
|
1. 测试Llama-70B在8卡[512, 512]场景下,pa_fp16 16batch的性能,使用归一代码
|
|
|
|
|
bash run.sh pa_fp16 performance [[512,512]] 16 llama [权重路径] 8
|
|
|
|
|
2. 测试Llama-65B在双机八卡的[256,256]场景下,pa_fp16 1batch的性能
|
|
|
|
|
node0:
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 llama [权重路径] [rank_table文件路径] 8 2 0
|
|
|
|
|
node1:
|
|
|
|
|
bash run.sh pa_fp16 performance [[256,256]] 1 llama [权重路径] [rank_table文件路径] 8 2 4
|
|
|
|
|
3. 测试Llama-70B使用归一代码在8卡pa_fp16 [256,256]场景下,[600,700]范围内与[512,512]场景下,[300,400]范围内,非首token时延在50ms以下时的最大batch_size
|
|
|
|
|
bash run.sh pa_fp16 performance_maxbs [[256,256],[512,512]] [[600,700],[300,400]] 50 llama [权重路径] 8
|
|
|
|
|
4. 测试Starcoder-15.5B在8卡pa_fp16下1 batch下游数据集BoolQ的精度
|
|
|
|
|
bash run.sh pa_fp16 full_BoolQ 1 starcoder [权重路径] 8
|
|
|
|
|
5. 单用例性能/精度测试,测试llama7b在8卡pa_fp16 [256,256]场景下,2 batch输入文本为"hello"和"what is your name"的性能
|
|
|
|
|
bash run.sh pa_fp16 performance_single [[256,256]] '["hello","what is your name"]' 2 llama [权重路径] 8
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## starcoder 特别运行操作说明
|
|
|
|
|
|
|
|
|
|
- 对于300I DUO设置环境变量,修改core/starcoder_test.py中prepare_environ函数。
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
os.environ['ATB_LAUNCH_KERNEL_WITH_TILING'] = "1"
|
|
|
|
|
os.environ['LCCL_ENABLE_FALLBACK'] = "0"
|
|
|
|
|
os.environ['HCCL_OP_EXPANSION_MODE'] = "AI_CPU"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## baichuan2-13b 特别运行操作说明
|
|
|
|
|
|
|
|
|
|
- 对于300I DUO设置环境变量,修改core/baichuan2_13b_test.py中prepare_environ函数。
|
|
|
|
|
|
|
|
|
|
```shell
|
|
|
|
|
os.environ['ATB_OPERATION_EXECUTE_ASYNC'] = "0"
|
|
|
|
|
os.environ['TASK_QUEUE_ENABLE'] = "0"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 多机推理
|
|
|
|
|
- 在性能测试时开启"AIV"提升性能,若有确定性计算需求时建议关闭"AIV"
|
|
|
|
|
```shell
|
|
|
|
|
export HCCL_OP_EXPANSION_MODE="AIV"
|
|
|
|
|
```
|
|
|
|
|
- 若要在运行时中断进程,ctrl+C无效,需要使用pkill结束进程
|
|
|
|
|
- 需要同时运行双机命令
|