Go to file

hoshi-hiyouga 21daf34f0c Merge pull request #221 from mrhan1993/main 根据GLM Efficient Tuning添加中文README，web添加了server_port参数		2023-07-22 13:04:25 +08:00
assets	add datasets	2023-07-19 20:59:15 +08:00
data	根据GLM Efficient Tuning添加中文README，web添加了server_port	2023-07-21 16:57:58 +08:00
src	Merge branch 'hiyouga:main' into main	2023-07-21 17:00:26 +08:00
tests	fix saving custom code	2023-07-16 18:04:41 +08:00
.gitattributes	Initial commit	2023-05-28 18:09:04 +08:00
LICENSE	Initial commit	2023-05-28 18:09:04 +08:00
README.md	根据GLM Efficient Tuning添加中文README，web添加了server_port	2023-07-21 16:57:58 +08:00
README_zh.md	根据GLM Efficient Tuning添加中文README，web添加了server_port	2023-07-21 16:57:58 +08:00
pyproject.toml	modity code structure	2023-07-15 16:54:28 +08:00
requirements.txt	release v0.1.3	2023-07-21 16:48:34 +08:00
setup.py	modity code structure	2023-07-15 16:54:28 +08:00

README.md

LLaMA Efficient Tuning

👋 Join our WeChat.

 English | [中文](README_zh.md)

Changelog

[23/07/19] Now we support training the LLaMA-2 models in this repo. Try --model_name_or_path meta-llama/Llama-2-7b-hf argument to use the LLaMA-2 model. Remember to use --prompt_template llama2 argument when you are using the LLaMA-2-chat model.

[23/07/18] Now we develop an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.

[23/07/11] Now we support training the Baichuan-13B model in this repo. Please replace the Baichuan-13B model file with tests/modeling_baichuan.py and try --model_name_or_path path_to_baichuan_model and --lora_target W_pack arguments to train the Baichuan-13B model. Remember to use --prompt_template baichuan argument when you are using the Baichuan-13B-Chat model.

[23/07/09] Now we release FastEdit⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.

[23/07/07] Now we support training the InternLM-7B model in this repo. Try --model_name_or_path internlm/internlm-7b argument to use the InternLM model. Remember to use --prompt_template intern argument when you are using the InternLM-chat model.

[23/07/05] Now we support training the Falcon-7B/40B models in this repo. Try --model_name_or_path tiiuae/falcon-7b and --lora_target query_key_value arguments to use the Falcon model.

[23/06/29] We provide a reproducible example of training a chat model using instruction-following datasets, see this Hugging Face Repo for details.

[23/06/22] Now we align the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.

[23/06/15] Now we support training the Baichuan-7B model in this repo. Try --model_name_or_path baichuan-inc/Baichuan-7B and --lora_target W_pack arguments to use the Baichuan-7B model.

[23/06/03] Now we support quantized training and inference (aka QLoRA). Try --quantization_bit 4/8 argument to work with quantized model. (experimental feature)

[23/05/31] Now we support training the BLOOM & BLOOMZ models in this repo. Try --model_name_or_path bigscience/bloomz-7b1-mt and --lora_target query_key_value arguments to use the BLOOMZ model.

Supported Models

LLaMA (7B/13B/33B/65B)
LLaMA-2 (7B/13B/70B)
BLOOM & BLOOMZ (560M/1.1B/1.7B/3B/7.1B/176B)
Falcon (7B/40B)
Baichuan (7B/13B)
InternLM (7B)

Supported Training Approaches

(Continually) pre-training
- Full-parameter tuning
- Partial-parameter tuning
- LoRA
- QLoRA
Supervised fine-tuning
- Full-parameter tuning
- Partial-parameter tuning
- LoRA
- QLoRA
RLHF
- LoRA
- QLoRA

Provided Datasets

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Python 3.8+ and PyTorch 1.13.1+
🤗Transformers, Datasets, Accelerate, PEFT and TRL
jieba, rouge-chinese and nltk (used at evaluation)
gradio and matplotlib (used in web_demo.py)
uvicorn, fastapi and sse-starlette (used in api_demo.py)

And powerful GPUs!

If you want to enable quantized LoRA (QLoRA) on the Windows platform, you should install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.1.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl

Getting Started

Data Preparation (optional)

Please refer to data/example_dataset for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Note: please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git clone https://github.com/hiyouga/LLaMA-Efficient-Tuning.git
conda create -n llama_etuning python=3.10
conda activate llama_etuning
cd LLaMA-Efficient-Tuning
pip install -r requirements.txt

All-in-one Web UI

python src/train_web.py

(Continually) Pre-Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage pt \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset wiki_demo \
    --finetuning_type lora \
    --output_dir path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Supervised Fine-Tuning

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Reward Model Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage rm \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --finetuning_type lora \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

PPO Training (RLHF)

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage ppo \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_sft_checkpoint \
    --reward_model path_to_rm_checkpoint \
    --output_dir path_to_ppo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --resume_lora_training False \
    --plot_loss

Distributed Training

accelerate config # configure the environment
accelerate launch src/train_bash.py # arguments (same as above)

Example configuration for full-tuning with DeepSpeed ZeRO-2

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 4
  gradient_clipping: 0.5
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_eval \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_eval_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

We recommend using --per_device_eval_batch_size=1 and --max_target_length 128 at 4/8-bit evaluation.

Predict

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_predict \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_predict_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

If you want to predict the samples with empty responses, please kindly fill the response column with dummy tokens to ensure the sample will not be discarded throughout the preprocessing phase.

API Demo

python src/api_demo.py \
    --model_name_or_path path_to_your_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Visit http://localhost:8000/docs for API documentation.

CLI Demo

python src/cli_demo.py \
    --model_name_or_path path_to_your_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --model_name_or_path path_to_your_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Export model

python src/export_model.py \
    --model_name_or_path path_to_your_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_export

License

This repository is licensed under the Apache-2.0 License.

Please follow the model licenses to use the corresponding model weights:

Citation

If this work is helpful, please kindly cite as:

@Misc{llama-efficient-tuning,
  title = {LLaMA Efficient Tuning},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/LLaMA-Efficient-Tuning}},
  year = {2023}
}

Acknowledgement

This repo is a sibling of ChatGLM-Efficient-Tuning. They share a similar code structure of efficient tuning on large language models.