diff --git a/README.md b/README.md index 347ebe7e..d10ef982 100644 --- a/README.md +++ b/README.md @@ -276,18 +276,19 @@ huggingface-cli login | ------------ | ------- | --------- | | python | 3.8 | 3.10 | | torch | 1.13.1 | 2.2.0 | -| transformers | 4.37.2 | 4.39.3 | -| datasets | 2.14.3 | 2.18.0 | -| accelerate | 0.27.2 | 0.28.0 | +| transformers | 4.37.2 | 4.40.1 | +| datasets | 2.14.3 | 2.19.1 | +| accelerate | 0.27.2 | 0.30.0 | | peft | 0.9.0 | 0.10.0 | -| trl | 0.8.1 | 0.8.1 | +| trl | 0.8.1 | 0.8.6 | | Optional | Minimum | Recommend | | ------------ | ------- | --------- | | CUDA | 11.6 | 12.2 | | deepspeed | 0.10.0 | 0.14.0 | -| bitsandbytes | 0.39.0 | 0.43.0 | -| flash-attn | 2.3.0 | 2.5.6 | +| bitsandbytes | 0.39.0 | 0.43.1 | +| vllm | 0.4.0 | 0.4.2 | +| flash-attn | 2.3.0 | 2.5.8 | ### Hardware Requirement @@ -305,24 +306,15 @@ huggingface-cli login ## Getting Started -### Data Preparation - -Please refer to [data/README.md](data/README.md) for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk. - -> [!NOTE] -> Please update `data/dataset_info.json` to use your custom dataset. - -### Dependence Installation +### Installation ```bash git clone https://github.com/hiyouga/LLaMA-Factory.git -conda create -n llama_factory python=3.10 -conda activate llama_factory cd LLaMA-Factory pip install -e .[metrics] ``` -Extra dependencies available: deepspeed, metrics, galore, badam, vllm, bitsandbytes, gptq, awq, aqlm, qwen, modelscope, quality +Extra dependencies available: metrics, deepspeed, bitsandbytes, vllm, galore, badam, gptq, awq, aqlm, qwen, modelscope, quality
For Windows users @@ -336,19 +328,41 @@ To enable FlashAttention-2 on the Windows platform, you need to install the prec
-### Train with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio)) +### Data Preparation + +Please refer to [data/README.md](data/README.md) for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk. + +> [!NOTE] +> Please update `data/dataset_info.json` to use your custom dataset. + +### Quickstart + +The following 3 commands conduct LoRA fine-tuning, inference and merging for Llama3-8B-Instruct model, respectively. + +```bash +CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml +CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml +CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml +``` + +See [examples/README.md](examples/README.md) for advanced usage. + +> [!TIP] +> Use `llamafactory-cli help` to show help information. + +### Use LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio)) > [!IMPORTANT] -> LLaMA Board GUI only supports training on a single GPU, please use [CLI](#train-with-command-line-interface) for distributed training. +> LLaMA Board GUI only supports training on a single GPU. #### Use local environment ```bash -llamafactory-cli webui +CUDA_VISIBLE_DEVICES=0 llamafactory-cli webui ``` > [!TIP] -> To modify the default setting in the LLaMA Board GUI, you can use environment variables, e.g., `export CUDA_VISIBLE_DEVICES=0 GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False` (use `set` command on Windows OS). +> To modify the default setting in the LLaMA Board GUI, you can use environment variables, e.g., `export GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False` (use `set` command on Windows OS).
For Alibaba Cloud users @@ -389,21 +403,10 @@ docker compose -f ./docker-compose.yml up -d
-### Train with Command Line Interface - -See [examples/README.md](examples/README.md) for usage. - -> [!TIP] -> Use `llamafactory-cli train -h` to display arguments description. - ### Deploy with OpenAI-style API and vLLM ```bash -CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api \ - --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \ - --template llama3 \ - --infer_backend vllm \ - --vllm_enforce_eager +CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml ``` ### Download from ModelScope Hub diff --git a/README_zh.md b/README_zh.md index 8a2fb79b..9c639f2c 100644 --- a/README_zh.md +++ b/README_zh.md @@ -163,7 +163,7 @@ https://github.com/hiyouga/LLaMA-Factory/assets/16256802/ec36a9dd-37f4-4f72-81bd | [Yuan](https://huggingface.co/IEITYuan) | 2B/51B/102B | q_proj,v_proj | yuan | > [!NOTE] -> **默认模块**应作为 `--lora_target` 参数的默认值,可使用 `--lora_target all` 参数指定全部模块以得到更好的效果。 +> **默认模块**应作为 `--lora_target` 参数的默认值,可使用 `--lora_target all` 参数指定全部模块以取得更好的效果。 > > 对于所有“基座”(Base)模型,`--template` 参数可以是 `default`, `alpaca`, `vicuna` 等任意值。但“对话”(Instruct/Chat)模型请务必使用**对应的模板**。 > @@ -276,18 +276,19 @@ huggingface-cli login | ------------ | ------- | --------- | | python | 3.8 | 3.10 | | torch | 1.13.1 | 2.2.0 | -| transformers | 4.37.2 | 4.39.3 | -| datasets | 2.14.3 | 2.18.0 | -| accelerate | 0.27.2 | 0.28.0 | +| transformers | 4.37.2 | 4.40.1 | +| datasets | 2.14.3 | 2.19.1 | +| accelerate | 0.27.2 | 0.30.0 | | peft | 0.9.0 | 0.10.0 | -| trl | 0.8.1 | 0.8.1 | +| trl | 0.8.1 | 0.8.6 | | 可选项 | 至少 | 推荐 | | ------------ | ------- | --------- | | CUDA | 11.6 | 12.2 | | deepspeed | 0.10.0 | 0.14.0 | -| bitsandbytes | 0.39.0 | 0.43.0 | -| flash-attn | 2.3.0 | 2.5.6 | +| bitsandbytes | 0.39.0 | 0.43.1 | +| vllm | 0.4.0 | 0.4.2 | +| flash-attn | 2.3.0 | 2.5.8 | ### 硬件依赖 @@ -305,24 +306,15 @@ huggingface-cli login ## 如何使用 -### 数据准备 - -关于数据集文件的格式,请参考 [data/README_zh.md](data/README_zh.md) 的内容。你可以使用 HuggingFace / ModelScope 上的数据集或加载本地数据集。 - -> [!NOTE] -> 使用自定义数据集时,请更新 `data/dataset_info.json` 文件。 - -### 安装依赖 +### 安装 LLaMA Factory ```bash git clone https://github.com/hiyouga/LLaMA-Factory.git -conda create -n llama_factory python=3.10 -conda activate llama_factory cd LLaMA-Factory pip install -e .[metrics] ``` -可选的额外依赖项:deepspeed、metrics、galore、badam、vllm、bitsandbytes、gptq、awq、aqlm、qwen、modelscope、quality +可选的额外依赖项:metrics、deepspeed、bitsandbytes、vllm、galore、badam、gptq、awq、aqlm、qwen、modelscope、quality
Windows 用户指南 @@ -336,19 +328,41 @@ pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/downl
-### 利用 LLaMA Board 可视化界面训练(由 [Gradio](https://github.com/gradio-app/gradio) 驱动) +### 数据准备 + +关于数据集文件的格式,请参考 [data/README_zh.md](data/README_zh.md) 的内容。你可以使用 HuggingFace / ModelScope 上的数据集或加载本地数据集。 + +> [!NOTE] +> 使用自定义数据集时,请更新 `data/dataset_info.json` 文件。 + +### 快速开始 + +下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调、推理和合并。 + +```bash +CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml +CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml +CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml +``` + +高级用法请参考 [examples/README_zh.md](examples/README_zh.md)。 + +> [!TIP] +> 使用 `llamafactory-cli help` 显示使用帮助。 + +### 使用 LLaMA Board 可视化界面(由 [Gradio](https://github.com/gradio-app/gradio) 驱动) > [!IMPORTANT] -> LLaMA Board 可视化界面目前仅支持单 GPU 训练,请使用[命令行接口](#利用命令行接口训练)来进行多 GPU 分布式训练。 +> LLaMA Board 可视化界面目前仅支持单 GPU 训练。 #### 使用本地环境 ```bash -llamafactory-cli webui +CUDA_VISIBLE_DEVICES=0 llamafactory-cli webui ``` > [!TIP] -> 您可以使用环境变量来修改 LLaMA Board 可视化界面的默认设置,例如 `export CUDA_VISIBLE_DEVICES=0 GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False`(Windows 系统可使用 `set` 指令)。 +> 您可以使用环境变量来修改 LLaMA Board 可视化界面的默认设置,例如 `export GRADIO_SERVER_NAME=0.0.0.0 GRADIO_SERVER_PORT=7860 GRADIO_SHARE=False`(Windows 系统可使用 `set` 指令)。
阿里云用户指南 @@ -389,21 +403,10 @@ docker compose -f ./docker-compose.yml up -d
-### 利用命令行接口训练 - -使用方法请参考 [examples/README_zh.md](examples/README_zh.md)。 - -> [!TIP] -> 您可以执行 `llamafactory-cli train -h` 来查看参数文档。 - ### 利用 vLLM 部署 OpenAI API ```bash -CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api \ - --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \ - --template llama3 \ - --infer_backend vllm \ - --vllm_enforce_eager +CUDA_VISIBLE_DEVICES=0,1 API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml ``` ### 从魔搭社区下载 diff --git a/examples/README.md b/examples/README.md index 895e9c72..0a14c5bd 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,9 +1,16 @@ We provide diverse examples about fine-tuning LLMs. +```bash +export CUDA_VISIBLE_DEVICES=0 +cd examples/lora_single_gpu +llamafactory-cli train llama3_lora_pretrain.yaml # Do continuous pre-training using LoRA + +``` + ``` examples/ ├── lora_single_gpu/ -│ ├── pretrain.sh: Do continuous pre-training using LoRA +│ ├── ` │ ├── sft.sh: Do supervised fine-tuning using LoRA │ ├── reward.sh: Do reward modeling using LoRA │ ├── ppo.sh: Do PPO training using LoRA diff --git a/examples/extras/badam/sft.sh b/examples/extras/badam/sft.sh index 4bcfe9d2..61167dad 100644 --- a/examples/extras/badam/sft.sh +++ b/examples/extras/badam/sft.sh @@ -10,7 +10,7 @@ CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ --finetuning_type full \ --use_badam \ --badam_switch_mode descending \ - --badam_switch_interval 50 \ + --badam_switch_block_every 50 \ --badam_verbose 2 \ --output_dir ../../../saves/LLaMA2-7B/badam/sft \ --overwrite_cache \ diff --git a/examples/inference/api_demo.sh b/examples/inference/api_demo.sh deleted file mode 100644 index 6f0f1b2e..00000000 --- a/examples/inference/api_demo.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 API_PORT=8000 llamafactory-cli api \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --template default \ - --finetuning_type lora diff --git a/examples/inference/cli_demo.sh b/examples/inference/cli_demo.sh deleted file mode 100644 index bc762411..00000000 --- a/examples/inference/cli_demo.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --template default \ - --finetuning_type lora diff --git a/examples/inference/evaluate.sh b/examples/inference/evaluate.sh deleted file mode 100644 index 5030329d..00000000 --- a/examples/inference/evaluate.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --template fewshot \ - --finetuning_type lora \ - --task mmlu \ - --split test \ - --lang en \ - --n_shot 5 \ - --batch_size 4 diff --git a/examples/inference/llama3.yaml b/examples/inference/llama3.yaml new file mode 100644 index 00000000..ffc5be82 --- /dev/null +++ b/examples/inference/llama3.yaml @@ -0,0 +1,2 @@ +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +template: llama3 diff --git a/examples/inference/llama3_lora_sft.yaml b/examples/inference/llama3_lora_sft.yaml new file mode 100644 index 00000000..262f4445 --- /dev/null +++ b/examples/inference/llama3_lora_sft.yaml @@ -0,0 +1,4 @@ +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +adapter_name_or_path: saves/llama3-8b/lora/sft +template: llama3 +finetuning_type: lora diff --git a/examples/inference/llama3_vllm.yaml b/examples/inference/llama3_vllm.yaml new file mode 100644 index 00000000..8dd3b61a --- /dev/null +++ b/examples/inference/llama3_vllm.yaml @@ -0,0 +1,4 @@ +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +template: llama3 +infer_backend: vllm +vllm_enforce_eager: true diff --git a/examples/inference/web_demo.sh b/examples/inference/web_demo.sh deleted file mode 100644 index a58cd2a0..00000000 --- a/examples/inference/web_demo.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash -# add `--visual_inputs True` to load MLLM - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --template default \ - --finetuning_type lora diff --git a/examples/lora_single_gpu/dpo.sh b/examples/lora_single_gpu/dpo.sh deleted file mode 100644 index 2cb6cb01..00000000 --- a/examples/lora_single_gpu/dpo.sh +++ /dev/null @@ -1,35 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage dpo \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --create_new_adapter \ - --dataset orca_rlhf \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/dpo \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 1e-5 \ - --num_train_epochs 1.0 \ - --max_samples 1000 \ - --val_size 0.1 \ - --dpo_ftx 1.0 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/llama3_lora_dpo.yaml b/examples/lora_single_gpu/llama3_lora_dpo.yaml new file mode 100644 index 00000000..f71f752d --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_dpo.yaml @@ -0,0 +1,39 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: dpo +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj +dpo_ftx: 1.0 + +# dataset +dataset: orca_rlhf +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/dpo +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.00001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/llama3_lora_eval.yaml b/examples/lora_single_gpu/llama3_lora_eval.yaml new file mode 100644 index 00000000..5808a47a --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_eval.yaml @@ -0,0 +1,19 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +adapter_name_or_path: saves/llama3-8b/lora/sft + +# method +finetuning_type: lora + +# dataset +task: mmlu +split: test +template: fewshot +lang: en +n_shot: 5 + +# output +save_dir: saves/llama3-8b/lora/eval + +# eval +batch_size: 4 diff --git a/examples/lora_single_gpu/llama3_lora_orpo.yaml b/examples/lora_single_gpu/llama3_lora_orpo.yaml new file mode 100644 index 00000000..5d78d260 --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_orpo.yaml @@ -0,0 +1,38 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: orpo +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: orca_rlhf +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/orpo +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.00001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/llama3_lora_ppo.yaml b/examples/lora_single_gpu/llama3_lora_ppo.yaml new file mode 100644 index 00000000..8d78d20d --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_ppo.yaml @@ -0,0 +1,38 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +reward_model: saves/llama3-8b/lora/reward + +# method +stage: ppo +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: identity,alpaca_gpt4_en +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/ppo +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.00001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# generate +max_new_tokens: 512 +top_k: 0 +top_p: 0.9 diff --git a/examples/lora_single_gpu/llama3_lora_predict.yaml b/examples/lora_single_gpu/llama3_lora_predict.yaml new file mode 100644 index 00000000..5a9de686 --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_predict.yaml @@ -0,0 +1,24 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +adapter_name_or_path: saves/llama3-8b/lora/sft + +# method +stage: sft +do_predict: true +finetuning_type: lora + +# dataset +dataset: identity,alpaca_gpt4_en +template: llama3 +cutoff_len: 1024 +max_samples: 50 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/predict +overwrite_output_dir: true + +# eval +per_device_eval_batch_size: 1 +predict_with_generate: true diff --git a/examples/lora_single_gpu/llama3_lora_pretrain.yaml b/examples/lora_single_gpu/llama3_lora_pretrain.yaml new file mode 100644 index 00000000..64245b71 --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_pretrain.yaml @@ -0,0 +1,37 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: pt +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: c4_demo +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/sft +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.0001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/llama3_lora_reward.yaml b/examples/lora_single_gpu/llama3_lora_reward.yaml new file mode 100644 index 00000000..f190f4ac --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_reward.yaml @@ -0,0 +1,38 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: rm +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: orca_rlhf +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/reward +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.00001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/llama3_lora_sft.yaml b/examples/lora_single_gpu/llama3_lora_sft.yaml new file mode 100644 index 00000000..f99df305 --- /dev/null +++ b/examples/lora_single_gpu/llama3_lora_sft.yaml @@ -0,0 +1,38 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: sft +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: identity,alpaca_gpt4_en +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llama3-8b/lora/sft +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.0001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/llama3_preprocess.yaml b/examples/lora_single_gpu/llama3_preprocess.yaml new file mode 100644 index 00000000..04df9631 --- /dev/null +++ b/examples/lora_single_gpu/llama3_preprocess.yaml @@ -0,0 +1,22 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct + +# method +stage: sft +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: identity,alpaca_gpt4_en +template: llama3 +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 +tokenized_path: saves/llama3-8b/dataset/sft # use `tokenized_path` in config to load data + +# output +output_dir: saves/llama3-8b/lora/sft +overwrite_output_dir: true diff --git a/examples/lora_single_gpu/llava1_5_lora_sft.yaml b/examples/lora_single_gpu/llava1_5_lora_sft.yaml new file mode 100644 index 00000000..96c2701a --- /dev/null +++ b/examples/lora_single_gpu/llava1_5_lora_sft.yaml @@ -0,0 +1,39 @@ +# model +model_name_or_path: llava-hf/llava-1.5-7b-hf +visual_inputs: true + +# method +stage: sft +do_train: true +finetuning_type: lora +lora_target: q_proj,v_proj + +# dataset +dataset: mllm_demo +template: vicuna +cutoff_len: 1024 +max_samples: 1000 +val_size: 0.1 +overwrite_cache: true +preprocessing_num_workers: 16 + +# output +output_dir: saves/llava1_5-7b/lora/sft +logging_steps: 10 +save_steps: 500 +plot_loss: true +overwrite_output_dir: true + +# train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 8 +learning_rate: 0.0001 +num_train_epochs: 3.0 +lr_scheduler_type: cosine +warmup_steps: 0.1 +fp16: true + +# eval +per_device_eval_batch_size: 1 +evaluation_strategy: steps +eval_steps: 500 diff --git a/examples/lora_single_gpu/orpo.sh b/examples/lora_single_gpu/orpo.sh deleted file mode 100644 index 335707bf..00000000 --- a/examples/lora_single_gpu/orpo.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage orpo \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --dataset orca_rlhf \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/orpo \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 1e-5 \ - --num_train_epochs 1.0 \ - --max_samples 1000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/ppo.sh b/examples/lora_single_gpu/ppo.sh deleted file mode 100644 index 9eccb05e..00000000 --- a/examples/lora_single_gpu/ppo.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage ppo \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --create_new_adapter \ - --dataset alpaca_gpt4_en \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --reward_model ../../saves/LLaMA2-7B/lora/reward \ - --output_dir ../../saves/LLaMA2-7B/lora/ppo \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 512 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --save_steps 100 \ - --learning_rate 1e-5 \ - --num_train_epochs 1.0 \ - --max_samples 1000 \ - --top_k 0 \ - --top_p 0.9 \ - --max_new_tokens 256 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/predict.sh b/examples/lora_single_gpu/predict.sh deleted file mode 100644 index 250efed1..00000000 --- a/examples/lora_single_gpu/predict.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_predict \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft,../../saves/LLaMA2-7B/lora/dpo \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --output_dir ../../saves/LLaMA2-7B/lora/predict \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_eval_batch_size 1 \ - --max_samples 20 \ - --predict_with_generate diff --git a/examples/lora_single_gpu/prepare.sh b/examples/lora_single_gpu/prepare.sh deleted file mode 100644 index 277f9b7a..00000000 --- a/examples/lora_single_gpu/prepare.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash -# use `--tokenized_path` in training script to load data - -CUDA_VISIBLE_DEVICES= llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --max_samples 3000 \ - --tokenized_path ../../saves/datasets/sft diff --git a/examples/lora_single_gpu/pretrain.sh b/examples/lora_single_gpu/pretrain.sh deleted file mode 100644 index 0782f00c..00000000 --- a/examples/lora_single_gpu/pretrain.sh +++ /dev/null @@ -1,31 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage pt \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --dataset c4_demo \ - --dataset_dir ../../data \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/pretrain \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 10000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/reward.sh b/examples/lora_single_gpu/reward.sh deleted file mode 100644 index 678809fd..00000000 --- a/examples/lora_single_gpu/reward.sh +++ /dev/null @@ -1,33 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage rm \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --create_new_adapter \ - --dataset orca_rlhf \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/reward \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --learning_rate 1e-5 \ - --num_train_epochs 1.0 \ - --max_samples 5000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/sft.sh b/examples/lora_single_gpu/sft.sh deleted file mode 100644 index 2047e21f..00000000 --- a/examples/lora_single_gpu/sft.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/lora_single_gpu/sft_mllm.sh b/examples/lora_single_gpu/sft_mllm.sh deleted file mode 100644 index 53e37262..00000000 --- a/examples/lora_single_gpu/sft_mllm.sh +++ /dev/null @@ -1,33 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path llava-hf/llava-1.5-7b-hf \ - --visual_inputs \ - --dataset mllm_demo \ - --dataset_dir ../../data \ - --template vicuna \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft_mllm \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --preprocessing_num_workers 16 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --warmup_steps 20 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 100.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/merge_lora/llama3_gptq.yaml b/examples/merge_lora/llama3_gptq.yaml new file mode 100644 index 00000000..eac12f90 --- /dev/null +++ b/examples/merge_lora/llama3_gptq.yaml @@ -0,0 +1,11 @@ +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +template: llama3 + +# export +export_dir: models/llama3_gptq +export_quantization_bit: 4 +export_quantization_dataset: data/c4_demo.json +export_size: 2 +export_device: cpu +export_legacy_format: false diff --git a/examples/merge_lora/llama3_lora_sft.yaml b/examples/merge_lora/llama3_lora_sft.yaml new file mode 100644 index 00000000..508a0b8c --- /dev/null +++ b/examples/merge_lora/llama3_lora_sft.yaml @@ -0,0 +1,13 @@ +# Note: DO NOT use quantized model or quantization_bit when merging lora weights + +# model +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct +adapter_name_or_path: saves/llama3-8b/lora/sft +template: llama3 +finetuning_type: lora + +# export +export_dir: models/llama3_lora_sft +export_size: 2 +export_device: cpu +export_legacy_format: false diff --git a/examples/merge_lora/merge.sh b/examples/merge_lora/merge.sh deleted file mode 100644 index 186e64a4..00000000 --- a/examples/merge_lora/merge.sh +++ /dev/null @@ -1,12 +0,0 @@ -#!/bin/bash -# DO NOT use quantized model or quantization_bit when merging lora weights - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --adapter_name_or_path ../../saves/LLaMA2-7B/lora/sft \ - --template default \ - --finetuning_type lora \ - --export_dir ../../models/llama2-7b-sft \ - --export_size 2 \ - --export_device cpu \ - --export_legacy_format False diff --git a/examples/merge_lora/quantize.sh b/examples/merge_lora/quantize.sh deleted file mode 100644 index 4a104645..00000000 --- a/examples/merge_lora/quantize.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash -# NEED TO run `merge.sh` before using this script - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \ - --model_name_or_path ../../models/llama2-7b-sft \ - --template default \ - --export_dir ../../models/llama2-7b-sft-int4 \ - --export_quantization_bit 4 \ - --export_quantization_dataset ../../data/c4_demo.json \ - --export_size 2 \ - --export_legacy_format False diff --git a/examples/qlora_single_gpu/aqlm.sh b/examples/qlora_single_gpu/aqlm.sh deleted file mode 100644 index 1e0a71ca..00000000 --- a/examples/qlora_single_gpu/aqlm.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/qlora_single_gpu/awq.sh b/examples/qlora_single_gpu/awq.sh deleted file mode 100644 index c13c8134..00000000 --- a/examples/qlora_single_gpu/awq.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path TheBloke/Llama-2-7B-AWQ \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/qlora_single_gpu/bitsandbytes.sh b/examples/qlora_single_gpu/bitsandbytes.sh deleted file mode 100644 index 27f48d41..00000000 --- a/examples/qlora_single_gpu/bitsandbytes.sh +++ /dev/null @@ -1,31 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path meta-llama/Llama-2-7b-hf \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --quantization_bit 4 \ - --plot_loss \ - --fp16 diff --git a/examples/qlora_single_gpu/gptq.sh b/examples/qlora_single_gpu/gptq.sh deleted file mode 100644 index 5b1b80e1..00000000 --- a/examples/qlora_single_gpu/gptq.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash - -CUDA_VISIBLE_DEVICES=0 llamafactory-cli train \ - --stage sft \ - --do_train \ - --model_name_or_path TheBloke/Llama-2-7B-GPTQ \ - --dataset alpaca_gpt4_en,glaive_toolcall \ - --dataset_dir ../../data \ - --template default \ - --finetuning_type lora \ - --lora_target q_proj,v_proj \ - --output_dir ../../saves/LLaMA2-7B/lora/sft \ - --overwrite_cache \ - --overwrite_output_dir \ - --cutoff_len 1024 \ - --per_device_train_batch_size 1 \ - --per_device_eval_batch_size 1 \ - --gradient_accumulation_steps 8 \ - --lr_scheduler_type cosine \ - --logging_steps 10 \ - --save_steps 100 \ - --eval_steps 100 \ - --evaluation_strategy steps \ - --load_best_model_at_end \ - --learning_rate 5e-5 \ - --num_train_epochs 3.0 \ - --max_samples 3000 \ - --val_size 0.1 \ - --plot_loss \ - --fp16 diff --git a/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml b/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml new file mode 100644 index 00000000..2bd99740 --- /dev/null +++ b/examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml @@ -0,0 +1,27 @@ +stage: sft +do_train: true +model_name_or_path: BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf +dataset: alpaca_gpt4_en,glaive_toolcall +dataset_dir: data +template: default +finetuning_type: lora +lora_target: q_proj,v_proj +output_dir: ../../saves/LLaMA2-7B/lora/sft +overwrite_cache: true +overwrite_output_dir: true +cutoff_len: 1024 +per_device_train_batch_size: 1 +per_device_eval_batch_size: 1 +gradient_accumulation_steps: 8 +lr_scheduler_type: cosine +logging_steps: 10 +save_steps: 100 +eval_steps: 100 +evaluation_strategy: steps +load_best_model_at_end: true +learning_rate: 5e-5 +num_train_epochs: 3.0 +max_samples: 3000 +val_size: 0.1 +plot_loss: true +fp16: true diff --git a/examples/qlora_single_gpu/llama3_lora_sft_awq.yaml b/examples/qlora_single_gpu/llama3_lora_sft_awq.yaml new file mode 100644 index 00000000..e69de29b diff --git a/examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml b/examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml new file mode 100644 index 00000000..e69de29b diff --git a/examples/qlora_single_gpu/llama3_lora_sft_gptq.yaml b/examples/qlora_single_gpu/llama3_lora_sft_gptq.yaml new file mode 100644 index 00000000..e69de29b diff --git a/setup.py b/setup.py index f7589eb8..7b849942 100644 --- a/setup.py +++ b/setup.py @@ -20,12 +20,12 @@ def get_requires(): extra_require = { - "deepspeed": ["deepspeed>=0.10.0"], "metrics": ["nltk", "jieba", "rouge-chinese"], + "deepspeed": ["deepspeed>=0.10.0"], + "bitsandbytes": ["bitsandbytes>=0.39.0"], + "vllm": ["vllm>=0.4.0"], "galore": ["galore-torch"], "badam": ["badam"], - "vllm": ["vllm>=0.4.0"], - "bitsandbytes": ["bitsandbytes>=0.39.0"], "gptq": ["optimum>=1.16.0", "auto-gptq>=0.5.0"], "awq": ["autoawq"], "aqlm": ["aqlm[gpu]>=1.1.0"], diff --git a/src/webui.py b/src/webui.py new file mode 100644 index 00000000..c225c710 --- /dev/null +++ b/src/webui.py @@ -0,0 +1,9 @@ +from llmtuner.webui.interface import create_ui + + +def main(): + create_ui().queue().launch(server_name="0.0.0.0", server_port=None, share=False) + + +if __name__ == "__main__": + main()