diff --git a/README.md b/README.md
index 347ebe7e..d10ef982 100644
--- a/README.md
+++ b/README.md
@@ -276,18 +276,19 @@ huggingface-cli login
| ------------ | ------- | --------- |
| python | 3.8 | 3.10 |
| torch | 1.13.1 | 2.2.0 |
-| transformers | 4.37.2 | 4.39.3 |
-| datasets | 2.14.3 | 2.18.0 |
-| accelerate | 0.27.2 | 0.28.0 |
+| transformers | 4.37.2 | 4.40.1 |
+| datasets | 2.14.3 | 2.19.1 |
+| accelerate | 0.27.2 | 0.30.0 |
| peft | 0.9.0 | 0.10.0 |
-| trl | 0.8.1 | 0.8.1 |
+| trl | 0.8.1 | 0.8.6 |
| Optional | Minimum | Recommend |
| ------------ | ------- | --------- |
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
-| bitsandbytes | 0.39.0 | 0.43.0 |
-| flash-attn | 2.3.0 | 2.5.6 |
+| bitsandbytes | 0.39.0 | 0.43.1 |
+| vllm | 0.4.0 | 0.4.2 |
+| flash-attn | 2.3.0 | 2.5.8 |
### Hardware Requirement
@@ -305,24 +306,15 @@ huggingface-cli login
## Getting Started
-### Data Preparation
-
-Please refer to [data/README.md](data/README.md) for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk.
-
-> [!NOTE]
-> Please update `data/dataset_info.json` to use your custom dataset.
-
-### Dependence Installation
+### Installation
```bash
git clone https://github.com/hiyouga/LLaMA-Factory.git
-conda create -n llama_factory python=3.10
-conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]
```
-Extra dependencies available: deepspeed, metrics, galore, badam, vllm, bitsandbytes, gptq, awq, aqlm, qwen, modelscope, quality
+Extra dependencies available: metrics, deepspeed, bitsandbytes, vllm, galore, badam, gptq, awq, aqlm, qwen, modelscope, quality
For Windows users
@@ -336,19 +328,41 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
-### Train with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio))
+### Data Preparation
+
+Please refer to [data/README.md](data/README.md) for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk.
+
+> [!NOTE]
+> Please update `data/dataset_info.json` to use your custom dataset.
+
+### Quickstart
+
+The following 3 commands conduct LoRA fine-tuning, inference and merging for Llama3-8B-Instruct model, respectively.
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+See [examples/README.md](examples/README.md) for advanced usage.
+
+> [!TIP]
+> Use `llamafactory-cli help` to show help information.
+
+### Use LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio))
> [!IMPORTANT]
-> LLaMA Board GUI only supports training on a single GPU, please use [CLI](#train-with-command-line-interface) for distributed training.
+> LLaMA Board GUI only supports training on a single GPU.
#### Use local environment
```bash
-llamafactory-cli webui
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli webui
```
> [!TIP]
-> To modify the default setting in the LLaMA Board GUI, you can use environment variables, e.g., `export CUDA_VISIBLE_DEVICES=0 GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False` (use `set` command on Windows OS).
+> To modify the default setting in the LLaMA Board GUI, you can use environment variables, e.g., `export GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False` (use `set` command on Windows OS).
For Alibaba Cloud users
@@ -389,21 +403,10 @@ docker compose -f ./docker-compose.yml up -d
-### Train with Command Line Interface
-
-See [examples/README.md](examples/README.md) for usage.
-
-> [!TIP]
-> Use `llamafactory-cli train -h` to display arguments description.
-
### Deploy with OpenAI-style API and vLLM
```bash
-CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api \
- --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
- --template llama3 \
- --infer_backend vllm \
- --vllm_enforce_eager
+CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml
```
### Download from ModelScope Hub
diff --git a/README_zh.md b/README_zh.md
index 8a2fb79b..9c639f2c 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -163,7 +163,7 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd
| [Yuan](https://huggingface.co/IEITYuan) | 2B/51B/102B | q_proj,v_proj | yuan |
> [!NOTE]
-> **默认模块**应作为 `--lora_target` 参数的默认值,可使用 `--lora_target all` 参数指定全部模块以得到更好的效果。
+> **默认模块**应作为 `--lora_target` 参数的默认值,可使用 `--lora_target all` 参数指定全部模块以取得更好的效果。
>
> 对于所有“基座”(Base)模型,`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Instruct/Chat)模型请务必使用**对应的模板**。
>
@@ -276,18 +276,19 @@ huggingface-cli login
| ------------ | ------- | --------- |
| python | 3.8 | 3.10 |
| torch | 1.13.1 | 2.2.0 |
-| transformers | 4.37.2 | 4.39.3 |
-| datasets | 2.14.3 | 2.18.0 |
-| accelerate | 0.27.2 | 0.28.0 |
+| transformers | 4.37.2 | 4.40.1 |
+| datasets | 2.14.3 | 2.19.1 |
+| accelerate | 0.27.2 | 0.30.0 |
| peft | 0.9.0 | 0.10.0 |
-| trl | 0.8.1 | 0.8.1 |
+| trl | 0.8.1 | 0.8.6 |
| 可选项 | 至少 | 推荐 |
| ------------ | ------- | --------- |
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
-| bitsandbytes | 0.39.0 | 0.43.0 |
-| flash-attn | 2.3.0 | 2.5.6 |
+| bitsandbytes | 0.39.0 | 0.43.1 |
+| vllm | 0.4.0 | 0.4.2 |
+| flash-attn | 2.3.0 | 2.5.8 |
### 硬件依赖
@@ -305,24 +306,15 @@ huggingface-cli login
## 如何使用
-### 数据准备
-
-关于数据集文件的格式,请参考 [data/README_zh.md](data/README_zh.md) 的内容。你可以使用 HuggingFace / ModelScope 上的数据集或加载本地数据集。
-
-> [!NOTE]
-> 使用自定义数据集时,请更新 `data/dataset_info.json` 文件。
-
-### 安装依赖
+### 安装 LLaMA Factory
```bash
git clone https://github.com/hiyouga/LLaMA-Factory.git
-conda create -n llama_factory python=3.10
-conda activate llama_factory
cd LLaMA-Factory
pip install -e .[metrics]
```
-可选的额外依赖项:deepspeed、metrics、galore、badam、vllm、bitsandbytes、gptq、awq、aqlm、qwen、modelscope、quality
+可选的额外依赖项:metrics、deepspeed、bitsandbytes、vllm、galore、badam、gptq、awq、aqlm、qwen、modelscope、quality
Windows 用户指南
@@ -336,19 +328,41 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
-### 利用 LLaMA Board 可视化界面训练(由 [Gradio](https://github.com/gradio-app/gradio) 驱动)
+### 数据准备
+
+关于数据集文件的格式,请参考 [data/README_zh.md](data/README_zh.md) 的内容。你可以使用 HuggingFace / ModelScope 上的数据集或加载本地数据集。
+
+> [!NOTE]
+> 使用自定义数据集时,请更新 `data/dataset_info.json` 文件。
+
+### 快速开始
+
+下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调、推理和合并。
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+高级用法请参考 [examples/README_zh.md](examples/README_zh.md)。
+
+> [!TIP]
+> 使用 `llamafactory-cli help` 显示使用帮助。
+
+### 使用 LLaMA Board 可视化界面(由 [Gradio](https://github.com/gradio-app/gradio) 驱动)
> [!IMPORTANT]
-> LLaMA Board 可视化界面目前仅支持单 GPU 训练,请使用[命令行接口](#利用命令行接口训练)来进行多 GPU 分布式训练。
+> LLaMA Board 可视化界面目前仅支持单 GPU 训练。
#### 使用本地环境
```bash
-llamafactory-cli webui
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli webui
```
> [!TIP]
-> 您可以使用环境变量来修改 LLaMA Board 可视化界面的默认设置,例如 `export CUDA_VISIBLE_DEVICES=0 GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False`(Windows 系统可使用 `set` 指令)。
+> 您可以使用环境变量来修改 LLaMA Board 可视化界面的默认设置,例如 `export GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False`(Windows 系统可使用 `set` 指令)。
阿里云用户指南
@@ -389,21 +403,10 @@ docker compose -f ./docker-compose.yml up -d
-### 利用命令行接口训练
-
-使用方法请参考 [examples/README_zh.md](examples/README_zh.md)。
-
-> [!TIP]
-> 您可以执行 `llamafactory-cli train -h` 来查看参数文档。
-
### 利用 vLLM 部署 OpenAI API
```bash
-CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api \
- --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
- --template llama3 \
- --infer_backend vllm \
- --vllm_enforce_eager
+CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml
```
### 从魔搭社区下载
diff --git a/examples/README.md b/examples/README.md
index 895e9c72..0a14c5bd 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,9 +1,16 @@
We provide diverse examples about fine-tuning LLMs.
+```bash
+export CUDA_VISIBLE_DEVICES=0
+cd examples/lora_single_gpu
+llamafactory-cli train llama3_lora_pretrain.yaml # Do continuous pre-training using LoRA
+
+```
+
```
examples/
├── lora_single_gpu/
-│ ├── pretrain.sh: Do continuous pre-training using LoRA
+│ ├── `
│ ├── sft.sh: Do supervised fine-tuning using LoRA
│ ├── reward.sh: Do reward modeling using LoRA
│ ├── ppo.sh: Do PPO training using LoRA
diff --git a/examples/extras/badam/sft.sh b/examples/extras/badam/sft.sh
index 4bcfe9d2..61167dad 100644
--- a/examples/extras/badam/sft.sh
+++ b/examples/extras/badam/sft.sh
@@ -10,7 +10,7 @@ CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
--finetuning_type full \
--use_badam \
--badam_switch_mode descending \
- --badam_switch_interval 50 \
+ --badam_switch_block_every 50 \
--badam_verbose 2 \
--output_dir ../../../saves/LLaMA2-7B/badam/sft \
--overwrite_cache \
diff --git a/examples/inference/api_demo.sh b/examples/inference/api_demo.sh
deleted file mode 100644
index 6f0f1b2e..00000000
--- a/examples/inference/api_demo.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 API_PORT=8000 llamafactory-cli api \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --template default \
- --finetuning_type lora
diff --git a/examples/inference/cli_demo.sh b/examples/inference/cli_demo.sh
deleted file mode 100644
index bc762411..00000000
--- a/examples/inference/cli_demo.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --template default \
- --finetuning_type lora
diff --git a/examples/inference/evaluate.sh b/examples/inference/evaluate.sh
deleted file mode 100644
index 5030329d..00000000
--- a/examples/inference/evaluate.sh
+++ /dev/null
@@ -1,12 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --template fewshot \
- --finetuning_type lora \
- --task mmlu \
- --split test \
- --lang en \
- --n_shot 5 \
- --batch_size 4
diff --git a/examples/inference/llama3.yaml b/examples/inference/llama3.yaml
new file mode 100644
index 00000000..ffc5be82
--- /dev/null
+++ b/examples/inference/llama3.yaml
@@ -0,0 +1,2 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
diff --git a/examples/inference/llama3_lora_sft.yaml b/examples/inference/llama3_lora_sft.yaml
new file mode 100644
index 00000000..262f4445
--- /dev/null
+++ b/examples/inference/llama3_lora_sft.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+template: llama3
+finetuning_type: lora
diff --git a/examples/inference/llama3_vllm.yaml b/examples/inference/llama3_vllm.yaml
new file mode 100644
index 00000000..8dd3b61a
--- /dev/null
+++ b/examples/inference/llama3_vllm.yaml
@@ -0,0 +1,4 @@
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+infer_backend: vllm
+vllm_enforce_eager: true
diff --git a/examples/inference/web_demo.sh b/examples/inference/web_demo.sh
deleted file mode 100644
index a58cd2a0..00000000
--- a/examples/inference/web_demo.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-#!/bin/bash
-# add `--visual_inputs True` to load MLLM
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --template default \
- --finetuning_type lora
diff --git a/examples/lora_single_gpu/dpo.sh b/examples/lora_single_gpu/dpo.sh
deleted file mode 100644
index 2cb6cb01..00000000
--- a/examples/lora_single_gpu/dpo.sh
+++ /dev/null
@@ -1,35 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage dpo \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --create_new_adapter \
- --dataset orca_rlhf \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/dpo \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 1e-5 \
- --num_train_epochs 1.0 \
- --max_samples 1000 \
- --val_size 0.1 \
- --dpo_ftx 1.0 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/llama3_lora_dpo.yaml b/examples/lora_single_gpu/llama3_lora_dpo.yaml
new file mode 100644
index 00000000..f71f752d
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_dpo.yaml
@@ -0,0 +1,39 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: dpo
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+dpo_ftx: 1.0
+
+# dataset
+dataset: orca_rlhf
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/dpo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/llama3_lora_eval.yaml b/examples/lora_single_gpu/llama3_lora_eval.yaml
new file mode 100644
index 00000000..5808a47a
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_eval.yaml
@@ -0,0 +1,19 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+
+# method
+finetuning_type: lora
+
+# dataset
+task: mmlu
+split: test
+template: fewshot
+lang: en
+n_shot: 5
+
+# output
+save_dir: saves/llama3-8b/lora/eval
+
+# eval
+batch_size: 4
diff --git a/examples/lora_single_gpu/llama3_lora_orpo.yaml b/examples/lora_single_gpu/llama3_lora_orpo.yaml
new file mode 100644
index 00000000..5d78d260
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_orpo.yaml
@@ -0,0 +1,38 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: orpo
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: orca_rlhf
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/orpo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/llama3_lora_ppo.yaml b/examples/lora_single_gpu/llama3_lora_ppo.yaml
new file mode 100644
index 00000000..8d78d20d
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_ppo.yaml
@@ -0,0 +1,38 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+reward_model: saves/llama3-8b/lora/reward
+
+# method
+stage: ppo
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: identity,alpaca_gpt4_en
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/ppo
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# generate
+max_new_tokens: 512
+top_k: 0
+top_p: 0.9
diff --git a/examples/lora_single_gpu/llama3_lora_predict.yaml b/examples/lora_single_gpu/llama3_lora_predict.yaml
new file mode 100644
index 00000000..5a9de686
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_predict.yaml
@@ -0,0 +1,24 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+
+# method
+stage: sft
+do_predict: true
+finetuning_type: lora
+
+# dataset
+dataset: identity,alpaca_gpt4_en
+template: llama3
+cutoff_len: 1024
+max_samples: 50
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/predict
+overwrite_output_dir: true
+
+# eval
+per_device_eval_batch_size: 1
+predict_with_generate: true
diff --git a/examples/lora_single_gpu/llama3_lora_pretrain.yaml b/examples/lora_single_gpu/llama3_lora_pretrain.yaml
new file mode 100644
index 00000000..64245b71
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_pretrain.yaml
@@ -0,0 +1,37 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: pt
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: c4_demo
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.0001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/llama3_lora_reward.yaml b/examples/lora_single_gpu/llama3_lora_reward.yaml
new file mode 100644
index 00000000..f190f4ac
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_reward.yaml
@@ -0,0 +1,38 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: rm
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: orca_rlhf
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/reward
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.00001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/llama3_lora_sft.yaml b/examples/lora_single_gpu/llama3_lora_sft.yaml
new file mode 100644
index 00000000..f99df305
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_lora_sft.yaml
@@ -0,0 +1,38 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: identity,alpaca_gpt4_en
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llama3-8b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.0001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/llama3_preprocess.yaml b/examples/lora_single_gpu/llama3_preprocess.yaml
new file mode 100644
index 00000000..04df9631
--- /dev/null
+++ b/examples/lora_single_gpu/llama3_preprocess.yaml
@@ -0,0 +1,22 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+
+# method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: identity,alpaca_gpt4_en
+template: llama3
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+tokenized_path: saves/llama3-8b/dataset/sft # use `tokenized_path` in config to load data
+
+# output
+output_dir: saves/llama3-8b/lora/sft
+overwrite_output_dir: true
diff --git a/examples/lora_single_gpu/llava1_5_lora_sft.yaml b/examples/lora_single_gpu/llava1_5_lora_sft.yaml
new file mode 100644
index 00000000..96c2701a
--- /dev/null
+++ b/examples/lora_single_gpu/llava1_5_lora_sft.yaml
@@ -0,0 +1,39 @@
+# model
+model_name_or_path: llava-hf/llava-1.5-7b-hf
+visual_inputs: true
+
+# method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_target: q_proj,v_proj
+
+# dataset
+dataset: mllm_demo
+template: vicuna
+cutoff_len: 1024
+max_samples: 1000
+val_size: 0.1
+overwrite_cache: true
+preprocessing_num_workers: 16
+
+# output
+output_dir: saves/llava1_5-7b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+
+# train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 0.0001
+num_train_epochs: 3.0
+lr_scheduler_type: cosine
+warmup_steps: 0.1
+fp16: true
+
+# eval
+per_device_eval_batch_size: 1
+evaluation_strategy: steps
+eval_steps: 500
diff --git a/examples/lora_single_gpu/orpo.sh b/examples/lora_single_gpu/orpo.sh
deleted file mode 100644
index 335707bf..00000000
--- a/examples/lora_single_gpu/orpo.sh
+++ /dev/null
@@ -1,32 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage orpo \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --dataset orca_rlhf \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/orpo \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 1e-5 \
- --num_train_epochs 1.0 \
- --max_samples 1000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/ppo.sh b/examples/lora_single_gpu/ppo.sh
deleted file mode 100644
index 9eccb05e..00000000
--- a/examples/lora_single_gpu/ppo.sh
+++ /dev/null
@@ -1,32 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage ppo \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --create_new_adapter \
- --dataset alpaca_gpt4_en \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --reward_model ../../saves/LLaMA2-7B/lora/reward \
- --output_dir ../../saves/LLaMA2-7B/lora/ppo \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 512 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --save_steps 100 \
- --learning_rate 1e-5 \
- --num_train_epochs 1.0 \
- --max_samples 1000 \
- --top_k 0 \
- --top_p 0.9 \
- --max_new_tokens 256 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/predict.sh b/examples/lora_single_gpu/predict.sh
deleted file mode 100644
index 250efed1..00000000
--- a/examples/lora_single_gpu/predict.sh
+++ /dev/null
@@ -1,19 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_predict \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft,../../saves/LLaMA2-7B/lora/dpo \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --output_dir ../../saves/LLaMA2-7B/lora/predict \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_eval_batch_size 1 \
- --max_samples 20 \
- --predict_with_generate
diff --git a/examples/lora_single_gpu/prepare.sh b/examples/lora_single_gpu/prepare.sh
deleted file mode 100644
index 277f9b7a..00000000
--- a/examples/lora_single_gpu/prepare.sh
+++ /dev/null
@@ -1,19 +0,0 @@
-#!/bin/bash
-# use `--tokenized_path` in training script to load data
-
-CUDA_VISIBLE_DEVICES= llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --max_samples 3000 \
- --tokenized_path ../../saves/datasets/sft
diff --git a/examples/lora_single_gpu/pretrain.sh b/examples/lora_single_gpu/pretrain.sh
deleted file mode 100644
index 0782f00c..00000000
--- a/examples/lora_single_gpu/pretrain.sh
+++ /dev/null
@@ -1,31 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage pt \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --dataset c4_demo \
- --dataset_dir ../../data \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/pretrain \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 10000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/reward.sh b/examples/lora_single_gpu/reward.sh
deleted file mode 100644
index 678809fd..00000000
--- a/examples/lora_single_gpu/reward.sh
+++ /dev/null
@@ -1,33 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage rm \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --create_new_adapter \
- --dataset orca_rlhf \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/reward \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --learning_rate 1e-5 \
- --num_train_epochs 1.0 \
- --max_samples 5000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/sft.sh b/examples/lora_single_gpu/sft.sh
deleted file mode 100644
index 2047e21f..00000000
--- a/examples/lora_single_gpu/sft.sh
+++ /dev/null
@@ -1,32 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/lora_single_gpu/sft_mllm.sh b/examples/lora_single_gpu/sft_mllm.sh
deleted file mode 100644
index 53e37262..00000000
--- a/examples/lora_single_gpu/sft_mllm.sh
+++ /dev/null
@@ -1,33 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path llava-hf/llava-1.5-7b-hf \
- --visual_inputs \
- --dataset mllm_demo \
- --dataset_dir ../../data \
- --template vicuna \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft_mllm \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --preprocessing_num_workers 16 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --warmup_steps 20 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 100.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/merge_lora/llama3_gptq.yaml b/examples/merge_lora/llama3_gptq.yaml
new file mode 100644
index 00000000..eac12f90
--- /dev/null
+++ b/examples/merge_lora/llama3_gptq.yaml
@@ -0,0 +1,11 @@
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+template: llama3
+
+# export
+export_dir: models/llama3_gptq
+export_quantization_bit: 4
+export_quantization_dataset: data/c4_demo.json
+export_size: 2
+export_device: cpu
+export_legacy_format: false
diff --git a/examples/merge_lora/llama3_lora_sft.yaml b/examples/merge_lora/llama3_lora_sft.yaml
new file mode 100644
index 00000000..508a0b8c
--- /dev/null
+++ b/examples/merge_lora/llama3_lora_sft.yaml
@@ -0,0 +1,13 @@
+# Note: DO NOT use quantized model or quantization_bit when merging lora weights
+
+# model
+model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
+adapter_name_or_path: saves/llama3-8b/lora/sft
+template: llama3
+finetuning_type: lora
+
+# export
+export_dir: models/llama3_lora_sft
+export_size: 2
+export_device: cpu
+export_legacy_format: false
diff --git a/examples/merge_lora/merge.sh b/examples/merge_lora/merge.sh
deleted file mode 100644
index 186e64a4..00000000
--- a/examples/merge_lora/merge.sh
+++ /dev/null
@@ -1,12 +0,0 @@
-#!/bin/bash
-# DO NOT use quantized model or quantization_bit when merging lora weights
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \
- --template default \
- --finetuning_type lora \
- --export_dir ../../models/llama2-7b-sft \
- --export_size 2 \
- --export_device cpu \
- --export_legacy_format False
diff --git a/examples/merge_lora/quantize.sh b/examples/merge_lora/quantize.sh
deleted file mode 100644
index 4a104645..00000000
--- a/examples/merge_lora/quantize.sh
+++ /dev/null
@@ -1,11 +0,0 @@
-#!/bin/bash
-# NEED TO run `merge.sh` before using this script
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
- --model_name_or_path ../../models/llama2-7b-sft \
- --template default \
- --export_dir ../../models/llama2-7b-sft-int4 \
- --export_quantization_bit 4 \
- --export_quantization_dataset ../../data/c4_demo.json \
- --export_size 2 \
- --export_legacy_format False
diff --git a/examples/qlora_single_gpu/aqlm.sh b/examples/qlora_single_gpu/aqlm.sh
deleted file mode 100644
index 1e0a71ca..00000000
--- a/examples/qlora_single_gpu/aqlm.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/qlora_single_gpu/awq.sh b/examples/qlora_single_gpu/awq.sh
deleted file mode 100644
index c13c8134..00000000
--- a/examples/qlora_single_gpu/awq.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path TheBloke/Llama-2-7B-AWQ \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/qlora_single_gpu/bitsandbytes.sh b/examples/qlora_single_gpu/bitsandbytes.sh
deleted file mode 100644
index 27f48d41..00000000
--- a/examples/qlora_single_gpu/bitsandbytes.sh
+++ /dev/null
@@ -1,31 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path meta-llama/Llama-2-7b-hf \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --quantization_bit 4 \
- --plot_loss \
- --fp16
diff --git a/examples/qlora_single_gpu/gptq.sh b/examples/qlora_single_gpu/gptq.sh
deleted file mode 100644
index 5b1b80e1..00000000
--- a/examples/qlora_single_gpu/gptq.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/bin/bash
-
-CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \
- --stage sft \
- --do_train \
- --model_name_or_path TheBloke/Llama-2-7B-GPTQ \
- --dataset alpaca_gpt4_en,glaive_toolcall \
- --dataset_dir ../../data \
- --template default \
- --finetuning_type lora \
- --lora_target q_proj,v_proj \
- --output_dir ../../saves/LLaMA2-7B/lora/sft \
- --overwrite_cache \
- --overwrite_output_dir \
- --cutoff_len 1024 \
- --per_device_train_batch_size 1 \
- --per_device_eval_batch_size 1 \
- --gradient_accumulation_steps 8 \
- --lr_scheduler_type cosine \
- --logging_steps 10 \
- --save_steps 100 \
- --eval_steps 100 \
- --evaluation_strategy steps \
- --load_best_model_at_end \
- --learning_rate 5e-5 \
- --num_train_epochs 3.0 \
- --max_samples 3000 \
- --val_size 0.1 \
- --plot_loss \
- --fp16
diff --git a/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml b/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml
new file mode 100644
index 00000000..2bd99740
--- /dev/null
+++ b/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml
@@ -0,0 +1,27 @@
+stage: sft
+do_train: true
+model_name_or_path: BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf
+dataset: alpaca_gpt4_en,glaive_toolcall
+dataset_dir: data
+template: default
+finetuning_type: lora
+lora_target: q_proj,v_proj
+output_dir: ../../saves/LLaMA2-7B/lora/sft
+overwrite_cache: true
+overwrite_output_dir: true
+cutoff_len: 1024
+per_device_train_batch_size: 1
+per_device_eval_batch_size: 1
+gradient_accumulation_steps: 8
+lr_scheduler_type: cosine
+logging_steps: 10
+save_steps: 100
+eval_steps: 100
+evaluation_strategy: steps
+load_best_model_at_end: true
+learning_rate: 5e-5
+num_train_epochs: 3.0
+max_samples: 3000
+val_size: 0.1
+plot_loss: true
+fp16: true
diff --git a/examples/qlora_single_gpu/llama3_lora_sft_awq.yaml b/examples/qlora_single_gpu/llama3_lora_sft_awq.yaml
new file mode 100644
index 00000000..e69de29b
diff --git a/examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml b/examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml
new file mode 100644
index 00000000..e69de29b
diff --git a/examples/qlora_single_gpu/llama3_lora_sft_gptq.yaml b/examples/qlora_single_gpu/llama3_lora_sft_gptq.yaml
new file mode 100644
index 00000000..e69de29b
diff --git a/setup.py b/setup.py
index f7589eb8..7b849942 100644
--- a/setup.py
+++ b/setup.py
@@ -20,12 +20,12 @@ def get_requires():
extra_require = {
- "deepspeed": ["deepspeed>=0.10.0"],
"metrics": ["nltk", "jieba", "rouge-chinese"],
+ "deepspeed": ["deepspeed>=0.10.0"],
+ "bitsandbytes": ["bitsandbytes>=0.39.0"],
+ "vllm": ["vllm>=0.4.0"],
"galore": ["galore-torch"],
"badam": ["badam"],
- "vllm": ["vllm>=0.4.0"],
- "bitsandbytes": ["bitsandbytes>=0.39.0"],
"gptq": ["optimum>=1.16.0", "auto-gptq>=0.5.0"],
"awq": ["autoawq"],
"aqlm": ["aqlm[gpu]>=1.1.0"],
diff --git a/src/webui.py b/src/webui.py
new file mode 100644
index 00000000..c225c710
--- /dev/null
+++ b/src/webui.py
@@ -0,0 +1,9 @@
+from llmtuner.webui.interface import create_ui
+
+
+def main():
+ create_ui().queue().launch(server_name="0.0.0.0", server_port=None, share=False)
+
+
+if __name__ == "__main__":
+ main()