diff --git a/README.md b/README.md index aab67c01..0dd3d56e 100644 --- a/README.md +++ b/README.md @@ -48,19 +48,19 @@ ## Supported Models -| Model | Model size | Default module | Template | -| -------------------------------------------------------- | --------------------------- | ----------------- |----------| -| [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | q_proj,v_proj | - | -| [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | q_proj,v_proj | llama2 | -| [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | -| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | -| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | query_key_value | - | -| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | 7B/13B | W_pack | baichuan | -| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | 7B/13B | W_pack | baichuan | -| [InternLM](https://github.com/InternLM/InternLM) | 7B | q_proj,v_proj | intern | -| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | c_attn | chatml | -| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | 13B | q_proj,v_proj | xverse | -| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | 6B | query_key_value | chatglm2 | +| Model | Model size | Default module | Template | +| -------------------------------------------------------- | --------------------------- | ----------------- | --------- | +| [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | q_proj,v_proj | - | +| [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | q_proj,v_proj | llama2 | +| [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | +| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | +| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | query_key_value | - | +| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | 7B/13B | W_pack | baichuan | +| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | 7B/13B | W_pack | baichuan2 | +| [InternLM](https://github.com/InternLM/InternLM) | 7B | q_proj,v_proj | intern | +| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | c_attn | chatml | +| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | 13B | q_proj,v_proj | xverse | +| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | 6B | query_key_value | chatglm2 | - **Default module** is used for the `--lora_target` argument. Please use `python src/train_bash.py -h` to see all available options. - For the "base" models, the `--template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the corresponding template for the "chat" models. diff --git a/README_zh.md b/README_zh.md index 0ca5c8b8..90079ead 100644 --- a/README_zh.md +++ b/README_zh.md @@ -48,19 +48,19 @@ ## 模型 -| 模型名 | 模型大小 | 默认模块 | Template | -| -------------------------------------------------------- | --------------------------- | ----------------- |----------| -| [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | q_proj,v_proj | - | -| [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | q_proj,v_proj | llama2 | -| [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | -| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | -| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | query_key_value | - | -| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | 7B/13B | W_pack | baichuan | -| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | 7B/13B | W_pack | baichuan | -| [InternLM](https://github.com/InternLM/InternLM) | 7B | q_proj,v_proj | intern | -| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | c_attn | chatml | -| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | 13B | q_proj,v_proj | xverse | -| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | 6B | query_key_value | chatglm2 | +| 模型名 | 模型大小 | 默认模块 | Template | +| -------------------------------------------------------- | --------------------------- | ----------------- | --------- | +| [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | q_proj,v_proj | - | +| [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | q_proj,v_proj | llama2 | +| [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | +| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - | +| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | query_key_value | - | +| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | 7B/13B | W_pack | baichuan | +| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | 7B/13B | W_pack | baichuan2 | +| [InternLM](https://github.com/InternLM/InternLM) | 7B | q_proj,v_proj | intern | +| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | c_attn | chatml | +| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | 13B | q_proj,v_proj | xverse | +| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | 6B | query_key_value | chatglm2 | - **默认模块**是 `--lora_target` 参数的部分可选项。请使用 `python src/train_bash.py -h` 查看全部可选项。 - 对于所有“基座”(Base)模型,`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Chat)模型请务必使用对应的模板。 diff --git a/src/llmtuner/extras/constants.py b/src/llmtuner/extras/constants.py index 6c8bc8e3..f042f76d 100644 --- a/src/llmtuner/extras/constants.py +++ b/src/llmtuner/extras/constants.py @@ -78,7 +78,7 @@ DEFAULT_TEMPLATE = { "LLaMA2": "llama2", "ChineseLLaMA2": "llama2_zh", "Baichuan": "baichuan", - "Baichuan2": "baichuan", + "Baichuan2": "baichuan2", "InternLM": "intern", "Qwen": "chatml", "XVERSE": "xverse", diff --git a/src/llmtuner/extras/template.py b/src/llmtuner/extras/template.py index b4af406c..e479fa76 100644 --- a/src/llmtuner/extras/template.py +++ b/src/llmtuner/extras/template.py @@ -516,6 +516,49 @@ register_template( ) +r""" +Supports: https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat + https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat +Used for training and inference of the fine-tuned models. +""" +register_template( + name="baichuan2", + prefix=[ + "{{system}}" + ], + prompt=[ + {"token": ""}, # user token + "{{query}}", + {"token": ""} # assistant token + ], + system="", + sep=[] +) + + +r""" +Supports: https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat + https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat +Used for inference of the original model. +""" +register_template( + name="baichuan2_eval", + prefix=[ + "{{system}}", + {"token": ""} # user token + ], + prompt=[ + "{{query}}", + {"token": ""} # assistant token + ], + system="", + sep=[], + stop_words=[ + "" # user token + ] +) + + r""" Supports: https://huggingface.co/HuggingFaceH4/starchat-alpha https://huggingface.co/HuggingFaceH4/starchat-beta