update readme

2023-08-12 21:00:11 +08:00 · 2023-08-12 21:00:11 +08:00 · 1836c020c5
parent fa940c17b8
commit 1836c020c5
3 changed files with 6 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -12,6 +12,8 @@

 ## Changelog

+[23/08/12] Now we support **RoPE scaling** to extend the context length of the LLaMA models. Try `--rope_scaling linear` argument in training and `--rope_scaling dynamic` argument at inference to extrapolate the position embeddings.
+
 [23/08/11] Now we support **[DPO training](https://arxiv.org/abs/2305.18290)** for instruction-tuned models. See [this example](#dpo-training) to train your models (experimental feature).

 [23/08/03] Now we support training the **Qwen-7B** model in this repo. Try `--model_name_or_path Qwen/Qwen-7B-Chat` and `--lora_target c_attn` arguments to train the Qwen-7B model. Remember to use `--template chatml` argument when you are using the Qwen-7B-Chat model.
--- a/README_zh.md
+++ b/README_zh.md
@ -12,6 +12,8 @@

 ## 更新日志

+[23/08/12] 现在我们支持了 **RoPE 插值**来扩展 LLaMA 模型的上下文长度。请尝试使用 `--rope_scaling linear` 参数训练模型或使用 `--rope_scaling dynamic` 参数评估模型。
+
 [23/08/11] 现在我们支持了指令模型的 **[DPO 训练](https://arxiv.org/abs/2305.18290)**。详情请参阅[此示例](#dpo-训练)（实验性功能）。

 [23/08/03] 现在我们支持了 **Qwen-7B** 模型的训练。请尝试使用 `--model_name_or_path Qwen/Qwen-7B-Chat` 和 `--lora_target c_attn` 参数。使用 Qwen-7B-Chat 模型请添加 `--template chatml` 参数。
--- a/src/llmtuner/tuner/core/loader.py
+++ b/src/llmtuner/tuner/core/loader.py
@ -83,6 +83,8 @@ def load_model_and_tokenizer(

    # Set RoPE scaling
    if model_args.rope_scaling is not None:
+        require_version("transformers>=4.31.0", "RoPE scaling requires transformers>=4.31.0")
+
        if hasattr(config, "use_dynamic_ntk"): # for Qwen models
            if is_trainable:
                logger.warning("Qwen model does not support rope scaling in training.")