Merge pull request #382 from hiyouga/feature-updateReadme

add detailed model configs
This commit is contained in:
hoshi-hiyouga 2023-08-07 13:43:38 +08:00 committed by GitHub
commit da42d289ee
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 20 additions and 16 deletions

View File

@ -41,14 +41,16 @@
[23/05/31] Now we support training the **BLOOM & BLOOMZ** models in this repo. Try `--model_name_or_path bigscience/bloomz-7b1-mt` and `--lora_target query_key_value` arguments to use the BLOOMZ model. [23/05/31] Now we support training the **BLOOM & BLOOMZ** models in this repo. Try `--model_name_or_path bigscience/bloomz-7b1-mt` and `--lora_target query_key_value` arguments to use the BLOOMZ model.
## Supported Models ## Supported Models
| model | model size | model_name_or_path | lora_target | template |
- [LLaMA](https://github.com/facebookresearch/llama) (7B/13B/33B/65B) |-------------------------------------------------------------|-----------------------------|--------------------------------|-------------------|----------|
- [LLaMA-2](https://huggingface.co/meta-llama) (7B/13B/70B) | [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - | q_proj,v_proj | default |
- [BLOOM](https://huggingface.co/bigscience/bloom) & [BLOOMZ](https://huggingface.co/bigscience/bloomz) (560M/1.1B/1.7B/3B/7.1B/176B) | [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | meta-llama/Llama-2-7b-hf | q_proj,v_proj | llama2 |
- [Falcon](https://huggingface.co/tiiuae/falcon-7b) (7B/40B) | [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | bigscience/bloom-7b1 | query_key_value | default |
- [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B) (7B/13B) | [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | bigscience/bloomz-7b1-mt | query_key_value | default |
- [InternLM](https://github.com/InternLM/InternLM) (7B) | [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | tiiuae/falcon-7b | query_key_value | default |
- [Qwen](https://github.com/QwenLM/Qwen-7B) (7B) | [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B) | 7B/13B | baichuan-inc/Baichuan-13B-Chat | W_pack | baichuan |
| [InternLM](https://github.com/InternLM/InternLM) | 7B | internlm/internlm-7b | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | Qwen/Qwen-7B-Chat | c_attn | chatml |
## Supported Training Approaches ## Supported Training Approaches

View File

@ -41,14 +41,16 @@
[23/05/31] 现在我们支持了 **BLOOM & BLOOMZ** 模型的训练。请尝试使用 `--model_name_or_path bigscience/bloomz-7b1-mt``--lora_target query_key_value` 参数。 [23/05/31] 现在我们支持了 **BLOOM & BLOOMZ** 模型的训练。请尝试使用 `--model_name_or_path bigscience/bloomz-7b1-mt``--lora_target query_key_value` 参数。
## 模型 ## 模型
| model | model size | model_name_or_path | lora_target | template |
- [LLaMA](https://github.com/facebookresearch/llama) (7B/13B/33B/65B) |-------------------------------------------------------------|-----------------------------|--------------------------------|-------------------|----------|
- [LLaMA-2](https://huggingface.co/meta-llama) (7B/13B/70B) | [LLaMA](https://github.com/facebookresearch/llama) | 7B/13B/33B/65B | - | q_proj,v_proj | default |
- [BLOOM](https://huggingface.co/bigscience/bloom) & [BLOOMZ](https://huggingface.co/bigscience/bloomz) (560M/1.1B/1.7B/3B/7.1B/176B) | [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B | meta-llama/Llama-2-7b-hf | q_proj,v_proj | llama2 |
- [Falcon](https://huggingface.co/tiiuae/falcon-7b) (7B/40B) | [BLOOM](https://huggingface.co/bigscience/bloom) | 560M/1.1B/1.7B/3B/7.1B/176B | bigscience/bloom-7b1 | query_key_value | default |
- [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B) (7B/13B) | [BLOOMZ](https://huggingface.co/bigscience/bloomz) | 560M/1.1B/1.7B/3B/7.1B/176B | bigscience/bloomz-7b1-mt | query_key_value | default |
- [InternLM](https://github.com/InternLM/InternLM) (7B) | [Falcon](https://huggingface.co/tiiuae/falcon-7b) | 7B/40B | tiiuae/falcon-7b | query_key_value | default |
- [Qwen](https://github.com/QwenLM/Qwen-7B) (7B) | [Baichuan](https://huggingface.co/baichuan-inc/baichuan-7B) | 7B/13B | baichuan-inc/Baichuan-13B-Chat | W_pack | baichuan |
| [InternLM](https://github.com/InternLM/InternLM) | 7B | internlm/internlm-7b | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | 7B | Qwen/Qwen-7B-Chat | c_attn | chatml |
## 微调方法 ## 微调方法