update readme

2024-03-28 22:02:32 +08:00 · 2024-03-28 22:02:32 +08:00 · c1fe6ce782
parent 1e43319f9c
commit c1fe6ce782
3 changed files with 50 additions and 14 deletions
--- a/README.md
+++ b/README.md
@ -76,10 +76,10 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/

 [24/03/07] We supported gradient low-rank projection (**[GaLore](https://arxiv.org/abs/2403.03507)**) algorithm. See `examples/extras/galore` for usage.

-[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
-
 <details><summary>Full Changelog</summary>

+[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
+
 [24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `--use_dora` to activate DoRA training.

 [24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See `examples/extras/llama_pro` for usage.
@ -586,7 +586,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
 > [!TIP]
 > Use `--model_name_or_path path_to_export` solely to use the exported model.
 > 
-> Use `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
+> Use `CUDA_VISIBLE_DEVICES=0`, `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.

 ### Inference with OpenAI-style API

@ -662,19 +662,23 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \

 ### Dockerize Training

-#### Get ready
-
-Necessary dockerized environment is needed, such as Docker or Docker Compose.
-
-#### Docker support
+#### Use Docker

 ```bash
 docker build -f ./Dockerfile -t llama-factory:latest .

-docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latest
+docker run --gpus=all \
+    -v ./hf_cache:/root/.cache/huggingface/ \
+    -v ./data:/app/data \
+    -v ./output:/app/output \
+    -e CUDA_VISIBLE_DEVICES=0 \
+    -p 7860:7860 \
+    --shm-size 16G \
+    --name llama_factory \
+    -d llama-factory:latest
 ```

-#### Docker Compose support
+#### Use Docker Compose

 ```bash
 docker compose -f ./docker-compose.yml up -d
@ -682,7 +686,7 @@ docker compose -f ./docker-compose.yml up -d

 > [!TIP]
 > Details about volume:
-> * hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory.
+> * hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
 > * data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
 > * output: Set export dir to this location so that the merged result can be accessed directly on the host machine.

--- a/README_zh.md
+++ b/README_zh.md
@ -76,10 +76,10 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd

 [24/03/07] 我们支持了梯度低秩投影（**[GaLore](https://arxiv.org/abs/2403.03507)**）算法。详细用法请参照 `examples/extras/galore`。

-[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。（尚不支持 LoRA，请先合并权重。）
-
 <details><summary>展开日志</summary>

+[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。（尚不支持 LoRA，请先合并权重。）
+
 [24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `--use_dora` 参数进行 DoRA 微调。

 [24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 `examples/extras/llama_pro`。
@ -585,7 +585,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
 > [!TIP]
 > 仅使用 `--model_name_or_path path_to_export` 来加载导出后的模型。
 > 
-> 合并 LoRA 权重之后可再次使用 `--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
+> 合并 LoRA 权重之后可再次使用 `CUDA_VISIBLE_DEVICES=0`、`--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。

 ### 使用 OpenAI 风格 API 推理

@ -659,6 +659,36 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
 > [!TIP]
 > 我们建议在量化模型的预测中使用 `--per_device_eval_batch_size=1` 和 `--max_target_length 128`。

+### 使用容器
+
+#### 使用 Docker
+
+```bash
+docker build -f ./Dockerfile -t llama-factory:latest .
+
+docker run --gpus=all \
+    -v ./hf_cache:/root/.cache/huggingface/ \
+    -v ./data:/app/data \
+    -v ./output:/app/output \
+    -e CUDA_VISIBLE_DEVICES=0 \
+    -p 7860:7860 \
+    --shm-size 16G \
+    --name llama_factory \
+    -d llama-factory:latest
+```
+
+#### 使用 Docker Compose
+
+```bash
+docker compose -f ./docker-compose.yml up -d
+```
+
+> [!TIP]
+> 数据卷详情：
+> * hf_cache：使用宿主机的 Hugging Face 缓存文件夹，允许更改为新的目录。
+> * data：宿主机中存放数据集的文件夹路径。
+> * output：将导出目录设置为该路径后，即可在宿主机中访问导出后的模型。
+
 ## 使用了 LLaMA Factory 的项目

 1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -10,6 +10,8 @@ services:
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
    ports:
      - "7860:7860"
    ipc: host