From 63611de7ae09cd9578fcb9c6408035ec6bfb2cb2 Mon Sep 17 00:00:00 2001
From: hiyouga <hiyouga@buaa.edu.cn>
Date: Sun, 10 Sep 2023 21:01:20 +0800
Subject: [PATCH] update readme

---
 README.md    | 15 ++++++++-------
 README_zh.md | 15 ++++++++-------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index 2089f51b..4fa7db19 100644
--- a/README.md
+++ b/README.md
@@ -64,7 +64,7 @@
 | [XVERSE](https://github.com/xverse-ai/XVERSE-13B)        | 13B                         | q_proj,v_proj     | xverse    |
 | [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)         | 6B                          | query_key_value   | chatglm2  |
 
-> **Note**
+> [!NOTE]
 > **Default module** is used for the `--lora_target` argument, you can use `--lora_target all` to specify all the available modules.
 >
 > For the "base" models, the `--template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the corresponding template for the "chat" models.
@@ -79,7 +79,7 @@
 | PPO Training           |                    |                    | :white_check_mark: | :white_check_mark: |
 | DPO Training           | :white_check_mark: |                    | :white_check_mark: | :white_check_mark: |
 
-> **Note**
+> [!NOTE]
 > Use `--quantization_bit 4/8` argument to enable QLoRA.
 
 ## Provided Datasets
@@ -143,7 +143,7 @@ And **powerful GPUs**!
 
 Please refer to `data/example_dataset` for checking the details about the format of dataset files. You can either use a single `.json` file or a [dataset loading script](https://huggingface.co/docs/datasets/dataset_script) with multiple files to create a custom dataset.
 
-> **Note**
+> [!NOTE]
 > Please update `data/dataset_info.json` to use your custom dataset. About the format of this file, please refer to `data/README.md`.
 
 ### Dependence Installation (optional)
@@ -170,12 +170,12 @@ CUDA_VISIBLE_DEVICES=0 python src/train_web.py
 
 We strongly recommend using the all-in-one Web UI for newcomers since it can also generate training scripts **automatically**.
 
-> **Warning**
+> [!WARNING]
 > Currently the web UI only supports training on **a single GPU**.
 
 ### Train on a single GPU
 
-> **Warning**
+> [!IMPORTANT]
 > If you want to train models on multiple GPUs, please refer to [Distributed Training](#distributed-training).
 
 #### Pre-Training
@@ -344,6 +344,7 @@ deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
 
 ```json
 {
+  "train_batch_size": "auto",
   "train_micro_batch_size_per_gpu": "auto",
   "gradient_accumulation_steps": "auto",
   "gradient_clipping": "auto",
@@ -391,7 +392,7 @@ python src/api_demo.py \
     --checkpoint_dir path_to_checkpoint
 ```
 
-> **Note**
+> [!NOTE]
 > Visit `http://localhost:8000/docs` for API documentation.
 
 ### CLI Demo
@@ -431,7 +432,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
     --predict_with_generate
 ```
 
-> **Note**
+> [!NOTE]
 > We recommend using `--per_device_eval_batch_size=1` and `--max_target_length 128` at 4/8-bit evaluation.
 
 ### Predict
diff --git a/README_zh.md b/README_zh.md
index 423a7fce..c7b851d1 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -64,7 +64,7 @@
 | [XVERSE](https://github.com/xverse-ai/XVERSE-13B)        | 13B                         | q_proj,v_proj     | xverse    |
 | [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)         | 6B                          | query_key_value   | chatglm2  |
 
-> **Note**
+> [!NOTE]
 > **默认模块**应作为 `--lora_target` 参数的默认值，可使用 `--lora_target all` 参数指定全部模块。
 >
 > 对于所有“基座”（Base）模型，`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”（Chat）模型请务必使用对应的模板。
@@ -79,7 +79,7 @@
 | PPO 训练               |                    |                    | :white_check_mark: | :white_check_mark: |
 | DPO 训练               | :white_check_mark: |                    | :white_check_mark: | :white_check_mark: |
 
-> **Note**
+> [!NOTE]
 > 请使用 `--quantization_bit 4/8` 参数来启用 QLoRA 训练。
 
 ## 数据集
@@ -143,7 +143,7 @@ huggingface-cli login
 
 关于数据集文件的格式，请参考 `data/example_dataset` 文件夹的内容。构建自定义数据集时，既可以使用单个 `.json` 文件，也可以使用一个[数据加载脚本](https://huggingface.co/docs/datasets/dataset_script)和多个文件。
 
-> **Note**
+> [!NOTE]
 > 使用自定义数据集时，请更新 `data/dataset_info.json` 文件，该文件的格式请参考 `data/README.md`。
 
 ### 环境搭建（可跳过）
@@ -170,12 +170,12 @@ CUDA_VISIBLE_DEVICES=0 python src/train_web.py
 
 我们极力推荐新手使用浏览器一体化界面，因为它还可以**自动**生成运行所需的命令行脚本。
 
-> **Warning**
+> [!WARNING]
 > 目前网页 UI 仅支持**单卡训练**。
 
 ### 单 GPU 训练
 
-> **Warning**
+> [!IMPORTANT]
 > 如果您使用多张 GPU 训练模型，请移步[多 GPU 分布式训练](#多-gpu-分布式训练)部分。
 
 #### 预训练
@@ -343,6 +343,7 @@ deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
 
 ```json
 {
+  "train_batch_size": "auto",
   "train_micro_batch_size_per_gpu": "auto",
   "gradient_accumulation_steps": "auto",
   "gradient_clipping": "auto",
@@ -390,7 +391,7 @@ python src/api_demo.py \
     --checkpoint_dir path_to_checkpoint
 ```
 
-> **Note**
+> [!NOTE]
 > 关于 API 文档请见 `http://localhost:8000/docs`。
 
 ### 命令行测试
@@ -430,7 +431,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
     --predict_with_generate
 ```
 
-> **Note**
+> [!NOTE]
 > 我们建议在量化模型的评估中使用 `--per_device_eval_batch_size=1` 和 `--max_target_length 128`。
 
 ### 模型预测