update readme
This commit is contained in:
parent
1e43319f9c
commit
c1fe6ce782
26
README.md
26
README.md
|
@ -76,10 +76,10 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/
|
|||
|
||||
[24/03/07] We supported gradient low-rank projection (**[GaLore](https://arxiv.org/abs/2403.03507)**) algorithm. See `examples/extras/galore` for usage.
|
||||
|
||||
[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
|
||||
|
||||
<details><summary>Full Changelog</summary>
|
||||
|
||||
[24/03/07] We integrated **[vLLM](https://github.com/vllm-project/vllm)** for faster and concurrent inference. Try `--infer_backend vllm` to enjoy **270%** inference speed. (LoRA is not yet supported, merge it first.)
|
||||
|
||||
[24/02/28] We supported weight-decomposed LoRA (**[DoRA](https://arxiv.org/abs/2402.09353)**). Try `--use_dora` to activate DoRA training.
|
||||
|
||||
[24/02/15] We supported **block expansion** proposed by [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro). See `examples/extras/llama_pro` for usage.
|
||||
|
@ -586,7 +586,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
|
|||
> [!TIP]
|
||||
> Use `--model_name_or_path path_to_export` solely to use the exported model.
|
||||
>
|
||||
> Use `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
|
||||
> Use `CUDA_VISIBLE_DEVICES=0`, `--export_quantization_bit 4` and `--export_quantization_dataset data/c4_demo.json` to quantize the model with AutoGPTQ after merging the LoRA weights.
|
||||
|
||||
### Inference with OpenAI-style API
|
||||
|
||||
|
@ -662,19 +662,23 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
|
|||
|
||||
### Dockerize Training
|
||||
|
||||
#### Get ready
|
||||
|
||||
Necessary dockerized environment is needed, such as Docker or Docker Compose.
|
||||
|
||||
#### Docker support
|
||||
#### Use Docker
|
||||
|
||||
```bash
|
||||
docker build -f ./Dockerfile -t llama-factory:latest .
|
||||
|
||||
docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latest
|
||||
docker run --gpus=all \
|
||||
-v ./hf_cache:/root/.cache/huggingface/ \
|
||||
-v ./data:/app/data \
|
||||
-v ./output:/app/output \
|
||||
-e CUDA_VISIBLE_DEVICES=0 \
|
||||
-p 7860:7860 \
|
||||
--shm-size 16G \
|
||||
--name llama_factory \
|
||||
-d llama-factory:latest
|
||||
```
|
||||
|
||||
#### Docker Compose support
|
||||
#### Use Docker Compose
|
||||
|
||||
```bash
|
||||
docker compose -f ./docker-compose.yml up -d
|
||||
|
@ -682,7 +686,7 @@ docker compose -f ./docker-compose.yml up -d
|
|||
|
||||
> [!TIP]
|
||||
> Details about volume:
|
||||
> * hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory.
|
||||
> * hf_cache: Utilize Hugging Face cache on the host machine. Reassignable if a cache already exists in a different directory.
|
||||
> * data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
|
||||
> * output: Set export dir to this location so that the merged result can be accessed directly on the host machine.
|
||||
|
||||
|
|
36
README_zh.md
36
README_zh.md
|
@ -76,10 +76,10 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd
|
|||
|
||||
[24/03/07] 我们支持了梯度低秩投影(**[GaLore](https://arxiv.org/abs/2403.03507)**)算法。详细用法请参照 `examples/extras/galore`。
|
||||
|
||||
[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)
|
||||
|
||||
<details><summary>展开日志</summary>
|
||||
|
||||
[24/03/07] 我们集成了 **[vLLM](https://github.com/vllm-project/vllm)** 以实现极速并发推理。请使用 `--infer_backend vllm` 来获得 **270%** 的推理速度。(尚不支持 LoRA,请先合并权重。)
|
||||
|
||||
[24/02/28] 我们支持了 **[DoRA](https://arxiv.org/abs/2402.09353)** 微调。请使用 `--use_dora` 参数进行 DoRA 微调。
|
||||
|
||||
[24/02/15] 我们支持了 [LLaMA Pro](https://github.com/TencentARC/LLaMA-Pro) 提出的**块扩展**方法。详细用法请参照 `examples/extras/llama_pro`。
|
||||
|
@ -585,7 +585,7 @@ CUDA_VISIBLE_DEVICES= python src/export_model.py \
|
|||
> [!TIP]
|
||||
> 仅使用 `--model_name_or_path path_to_export` 来加载导出后的模型。
|
||||
>
|
||||
> 合并 LoRA 权重之后可再次使用 `--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
|
||||
> 合并 LoRA 权重之后可再次使用 `CUDA_VISIBLE_DEVICES=0`、`--export_quantization_bit 4` 和 `--export_quantization_dataset data/c4_demo.json` 基于 AutoGPTQ 量化模型。
|
||||
|
||||
### 使用 OpenAI 风格 API 推理
|
||||
|
||||
|
@ -659,6 +659,36 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
|
|||
> [!TIP]
|
||||
> 我们建议在量化模型的预测中使用 `--per_device_eval_batch_size=1` 和 `--max_target_length 128`。
|
||||
|
||||
### 使用容器
|
||||
|
||||
#### 使用 Docker
|
||||
|
||||
```bash
|
||||
docker build -f ./Dockerfile -t llama-factory:latest .
|
||||
|
||||
docker run --gpus=all \
|
||||
-v ./hf_cache:/root/.cache/huggingface/ \
|
||||
-v ./data:/app/data \
|
||||
-v ./output:/app/output \
|
||||
-e CUDA_VISIBLE_DEVICES=0 \
|
||||
-p 7860:7860 \
|
||||
--shm-size 16G \
|
||||
--name llama_factory \
|
||||
-d llama-factory:latest
|
||||
```
|
||||
|
||||
#### 使用 Docker Compose
|
||||
|
||||
```bash
|
||||
docker compose -f ./docker-compose.yml up -d
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> 数据卷详情:
|
||||
> * hf_cache:使用宿主机的 Hugging Face 缓存文件夹,允许更改为新的目录。
|
||||
> * data:宿主机中存放数据集的文件夹路径。
|
||||
> * output:将导出目录设置为该路径后,即可在宿主机中访问导出后的模型。
|
||||
|
||||
## 使用了 LLaMA Factory 的项目
|
||||
|
||||
1. Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [[arxiv]](https://arxiv.org/abs/2308.02223)
|
||||
|
|
|
@ -10,6 +10,8 @@ services:
|
|||
- ./hf_cache:/root/.cache/huggingface/
|
||||
- ./data:/app/data
|
||||
- ./output:/app/output
|
||||
environment:
|
||||
- CUDA_VISIBLE_DEVICES=0
|
||||
ports:
|
||||
- "7860:7860"
|
||||
ipc: host
|
||||
|
|
Loading…
Reference in New Issue