diff --git a/README.md b/README.md index 0ec5f8d2..535deefb 100644 --- a/README.md +++ b/README.md @@ -39,12 +39,12 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/6ba60acc-e2e2-4bec-b846 ## Benchmark -Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning), LLaMA-Factory's LoRA tuning offers up to **3.7 times faster** training speed with a better BLEU score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA-Factory's QLoRA further improves the efficiency regarding the GPU memory. +Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning), LLaMA-Factory's LoRA tuning offers up to **3.7 times faster** training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA-Factory's QLoRA further improves the efficiency regarding the GPU memory. ![benchmark](assets/benchmark.svg) - **Training Speed**: the number of training samples processed per second during the training. (bs=4, cutoff_len=1024) -- **BLEU Score**: BLEU-4 score on the development set of the [advertising text generation](https://aclanthology.org/D19-1321.pdf) task. (bs=4, cutoff_len=1024) +- **Rouge Score**: Rouge-2 score on the development set of the [advertising text generation](https://aclanthology.org/D19-1321.pdf) task. (bs=4, cutoff_len=1024) - **GPU Memory**: Peak GPU memory usage in 4-bit quantized training. (bs=1, cutoff_len=1024) - We adopt `pre_seq_len=128` for ChatGLM's P-Tuning and `lora_rank=32` for LLaMA-Factory's LoRA tuning. diff --git a/README_zh.md b/README_zh.md index ec3882e0..10418c3d 100644 --- a/README_zh.md +++ b/README_zh.md @@ -39,12 +39,12 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/6ba60acc-e2e2-4bec-b846 ## 性能指标 -与 ChatGLM 官方的 [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning) 微调相比,LLaMA-Factory 的 LoRA 微调提供了 **3.7 倍**的加速比,同时在广告文案生成任务上取得了更高的 BLEU 分数。结合 4 比特量化技术,LLaMA-Factory 的 QLoRA 微调进一步降低了 GPU 显存消耗。 +与 ChatGLM 官方的 [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/ptuning) 微调相比,LLaMA-Factory 的 LoRA 微调提供了 **3.7 倍**的加速比,同时在广告文案生成任务上取得了更高的 Rouge 分数。结合 4 比特量化技术,LLaMA-Factory 的 QLoRA 微调进一步降低了 GPU 显存消耗。 ![benchmark](assets/benchmark.svg) - **Training Speed**: 训练阶段每秒处理的样本数量。(批处理大小=4,截断长度=1024) -- **BLEU Score**: [广告文案生成](https://aclanthology.org/D19-1321.pdf)任务验证集上的 BLEU-4 分数。(批处理大小=4,截断长度=1024) +- **Rouge Score**: [广告文案生成](https://aclanthology.org/D19-1321.pdf)任务验证集上的 Rouge-2 分数。(批处理大小=4,截断长度=1024) - **GPU Memory**: 4 比特量化训练的 GPU 显存峰值。(批处理大小=1,截断长度=1024) - 我们在 ChatGLM 的 P-Tuning 中采用 `pre_seq_len=128`,在 LLaMA-Factory 的 LoRA 微调中采用 `lora_rank=32`。 diff --git a/assets/benchmark.svg b/assets/benchmark.svg index 0bf8f9f4..60f0aa4d 100644 --- a/assets/benchmark.svg +++ b/assets/benchmark.svg @@ -6,7 +6,7 @@ - 2023-11-18T09:17:37.531653 + 2023-11-18T11:28:03.028228 image/svg+xml @@ -41,12 +41,12 @@ z - - + @@ -313,105 +313,39 @@ z - + - - + + - - - - - + + - - - - - - - - - - + + + + + + + + + + + - + @@ -504,6 +483,23 @@ Q 3188 3456 2997 3625 Q 2806 3794 2456 3794 L 1791 3794 z +" transform="scale(0.015625)"/> + + +" clip-path="url(#p080f205d85)" style="fill: #6baed6"/> +" clip-path="url(#p080f205d85)" style="fill: #6baed6"/> +" clip-path="url(#p080f205d85)" style="fill: #6baed6"/> +" clip-path="url(#p080f205d85)" style="fill: #3182bd"/> +" clip-path="url(#p080f205d85)" style="fill: #3182bd"/> +" clip-path="url(#p080f205d85)" style="fill: #3182bd"/> - + - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + - + - - + - - + + @@ -988,6 +1048,15 @@ L 641 3500 L 641 4494 L 1759 4494 z +" transform="scale(0.015625)"/> + - - - @@ -1165,7 +1209,7 @@ z - +