Commit Graph

965 Commits

Author SHA1 Message Date
hiyouga 5c62881c5a fix bug in galore optimizer 2024-04-21 18:53:22 +08:00
hiyouga f58425ab45 fix mod stuff 2024-04-21 18:11:10 +08:00
hoshi-hiyouga d0273787be
Merge pull request #3338 from astramind-ai/main
Adding Mixture of Depth
2024-04-21 18:05:52 +08:00
hoshi-hiyouga 1fa287fd63 fix #3348 2024-04-20 10:34:09 +08:00
hiyouga ba559a659a fix #3352 2024-04-19 22:40:01 +08:00
hiyouga 14a605a2da fix llama3 template 2024-04-19 15:46:51 +08:00
Marco 4fb7e046b3 fix small typo 2024-04-18 20:33:29 +02:00
Marco 620add7b9f Added Mixture of Depths 2024-04-18 20:31:24 +02:00
hoshi-hiyouga 2aaaede247 support llama3 2024-04-19 01:13:50 +08:00
hiyouga 942362d008 fix #3324 2024-04-18 15:34:45 +08:00
hiyouga 3b43a3b7c5 tiny fix 2024-04-18 00:22:17 +08:00
hiyouga cab0598fd0 add mixtral 8x22B models 2024-04-17 23:35:59 +08:00
hiyouga 5f86053d75 add CodeQwen models 2024-04-17 23:27:22 +08:00
hiyouga c9a477322d fix #3316 2024-04-17 22:54:34 +08:00
hiyouga 6d641af703 fix #3317 2024-04-17 22:17:19 +08:00
hiyouga 278c5e10c4 lint 2024-04-16 18:21:09 +08:00
hoshi-hiyouga aa3206ec26
Merge pull request #3291 from codemayq/main
support for previewing custom dataset in directory format
2024-04-16 18:12:09 +08:00
hiyouga c00f0771a5 Update parser.py 2024-04-16 18:09:31 +08:00
hiyouga 5d62a51c12 update readme and gradio version 2024-04-16 18:09:16 +08:00
hiyouga e3d8fc75eb support badam for all stages 2024-04-16 17:44:48 +08:00
hoshi-hiyouga 4d660c5ade
Merge pull request #3287 from Ledzy/badam
[Feature] Add BAdam algorithm
2024-04-16 17:32:16 +08:00
hoshi-hiyouga c9828f4c6e
Update utils.py 2024-04-16 17:30:12 +08:00
hoshi-hiyouga 6700a1b9fa
Update trainer.py 2024-04-16 17:29:52 +08:00
hoshi-hiyouga 38a56706e0
Update utils.py 2024-04-16 17:29:30 +08:00
hoshi-hiyouga a950f3b81d
Update patcher.py 2024-04-16 17:29:19 +08:00
hoshi-hiyouga 750cdf2e74
Update adapter.py 2024-04-16 17:28:12 +08:00
hoshi-hiyouga 4660703674
Update parser.py 2024-04-16 17:27:25 +08:00
hoshi-hiyouga 5b59ff4212
Update parser.py 2024-04-16 17:27:02 +08:00
hoshi-hiyouga ec899cccf3
Update finetuning_args.py 2024-04-16 17:26:30 +08:00
Jonery 7ecb61822b resolve gradient checkpointing issue. 2024-04-16 12:05:27 +08:00
codingma 62294289dc add check 2024-04-16 10:56:39 +08:00
codingma 75aa6392e8 support for previewing custom dataset in directory format 2024-04-16 10:43:14 +08:00
hiyouga b3ac14ffc4 add empty template 2024-04-16 03:10:02 +08:00
hiyouga 7dc72fb58c support unsloth 2024.4 2024-04-16 00:25:03 +08:00
hiyouga 6543f3d449 add codegemma 2024-04-16 00:11:15 +08:00
hiyouga e0dbac2845 support cohere commandR #3184 2024-04-15 23:26:42 +08:00
Jonery 06c8908d3f Feature BAdam 2024-04-15 23:15:27 +08:00
hoshi-hiyouga 7a8ae3f4ac
Merge pull request #3254 from marko1616/feature/Add-support-for-CohereForAI/c4ai-command-r-plus
Add template&support for c4ai-command-r/plus (tested)
2024-04-15 22:59:35 +08:00
hoshi-hiyouga 3ccf0d0977
Update template.py 2024-04-15 22:58:01 +08:00
hoshi-hiyouga 268f53dddb
Update constants.py 2024-04-15 22:56:55 +08:00
hiyouga cce52351b5 update examples 2024-04-15 22:14:34 +08:00
marko1616 2c89b38720 change default_system accroding to official template 2024-04-15 20:45:46 +08:00
marko1616 90c5dddf9a Revert "Add support for function call(Not strictly following origin)"
This reverts commit d7b9bbc8b9.
2024-04-15 20:27:09 +08:00
marko1616 d7b9bbc8b9 Add support for function call(Not strictly following origin) 2024-04-15 20:16:52 +08:00
hoshi-hiyouga 0e0942d388
Merge pull request #3276 from liu-zichen/fix_mixtral
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga efc345c4b0 fix #3273 2024-04-15 15:32:58 +08:00
liuzc 9f4fe62386 fix: mixtral output_router_logits 2024-04-15 12:11:49 +08:00
marko1616 ab033dac4f Typo fix 2024-04-13 17:30:21 +08:00
marko1616 42806323f0 Typo fix 2024-04-13 07:52:11 +08:00
marko1616 d0705518ee Add c4ai-command-r-plus link 2024-04-13 07:32:40 +08:00
marko1616 6574a721d2 Add template&support(Not tested) 2024-04-13 04:31:33 +08:00
hiyouga c53a11b6fd fix model card 2024-04-12 17:11:59 +08:00
hiyouga 232642a621 fix #3238 2024-04-12 14:28:11 +08:00
hiyouga 3dfe4cf611 set dev version 2024-04-11 20:27:34 +08:00
hiyouga 9d4c949461 release v0.6.2 2024-04-11 20:08:51 +08:00
hiyouga 51d0a1a19e Merge branch 'main' of https://github.com/hiyouga/LLaMA-Factory 2024-04-10 23:58:18 +08:00
hiyouga a99f5ed0b6 fix #3225 2024-04-10 23:57:59 +08:00
hoshi-hiyouga 98bc97d8d2
Update adapter.py 2024-04-10 00:57:51 +08:00
hoshi-hiyouga 2111b586b6
Update adapter.py 2024-04-10 00:57:30 +08:00
Erich Schubert b5eefe5c4c
Pass additional_target to unsloth
Fixes #3200
2024-04-09 17:53:40 +02:00
hiyouga 7f6c2486b8 fix quant infer and qwen2moe 2024-04-09 17:12:59 +08:00
hiyouga 9a99fbc86d tiny fix 2024-04-08 21:28:39 +08:00
hoshi-hiyouga 4c6c4a0d88
Merge pull request #3161 from hiyouga/feature/add-mediatek-model
support Breeze-7B
2024-04-08 20:56:51 +08:00
codingma 7b76b4ca08 add empty line 2024-04-07 18:28:08 +08:00
codingma 34bdcba017 rename template to breeze 2024-04-07 18:27:20 +08:00
codingma 5a780e9eec rename template to breeze 2024-04-07 11:39:54 +08:00
codingma 2565a32bd9 support https://github.com/hiyouga/LLaMA-Factory/issues/3152 2024-04-07 11:34:01 +08:00
sliderSun 1d117b7bb6 fix spell error 2024-04-07 10:59:15 +08:00
sliderSun 21650d467c support Qwen1.5-32B 2024-04-07 10:56:03 +08:00
sliderSun 77044d9ef4 support Qwen1.5-32B 2024-04-07 10:26:13 +08:00
hiyouga a6d943804b tiny fix 2024-04-04 02:19:03 +08:00
hiyouga 4b920f24d3 back to gradio 4.21 and fix chat 2024-04-04 02:07:20 +08:00
hiyouga 5ddcecda50 fix bug in latest gradio 2024-04-04 00:55:31 +08:00
hiyouga 7f6e412604 fix requires for windows 2024-04-03 21:56:43 +08:00
hiyouga 148bda353f fix resize vocab at inference #3022 2024-04-03 18:14:24 +08:00
hiyouga ce77d98872 fix #3116 2024-04-03 14:47:59 +08:00
hiyouga 92dab8a90b simplify readme 2024-04-02 20:07:43 +08:00
hiyouga b267aeb53f add moe aux loss control #3085 2024-04-02 14:26:31 +08:00
hiyouga 9ddbe2866a fix #3022 2024-04-02 13:58:39 +08:00
hiyouga dd73a0c248 set dev version 2024-04-01 23:24:08 +08:00
hiyouga 4a6ca621c0 fix #3083 2024-04-01 22:53:52 +08:00
hiyouga 54b7d34908 add qwen1.5 moe 2024-04-01 21:49:40 +08:00
hiyouga aee634cd20 fix #3077 2024-04-01 21:35:18 +08:00
hiyouga eb259cc573 support infer 4bit model on GPUs #3023 2024-04-01 17:34:04 +08:00
hiyouga d0842f6828 update webui 2024-04-01 16:23:28 +08:00
hiyouga 816d714146 fix ORPO loss 2024-04-01 14:42:41 +08:00
hiyouga 5b9b40403d fix IPO and ORPO loss 2024-04-01 14:37:53 +08:00
hiyouga 5907216a1c fix plots 2024-03-31 19:43:48 +08:00
hiyouga 68aaa4904b use log1p in orpo loss
https://github.com/huggingface/trl/pull/1491
2024-03-31 19:27:08 +08:00
hiyouga 099db6acc0 update readme 2024-03-31 18:46:34 +08:00
hiyouga 5195add324 support orpo in webui 2024-03-31 18:34:59 +08:00
hiyouga 17bf8a2c3a support ORPO 2024-03-31 18:29:50 +08:00
hiyouga 27776c3474 tiny fix 2024-03-31 00:10:29 +08:00
marko1616 d9a5134617 fix blank line contains whitespace 2024-03-30 23:46:55 +08:00
marko1616 eb178eaff3 Fix Llama model save for full param train 2024-03-30 23:45:04 +08:00
hiyouga 7a086ed333 support save args in webui #2807 #3046
some ideas are borrowed from @marko1616
2024-03-30 23:09:12 +08:00
hiyouga 831c5321ac upgrade gradio to 4.21.0 2024-03-30 20:37:08 +08:00
hiyouga ca793028c6 release v0.6.1 2024-03-29 11:36:08 +08:00
hiyouga 8d603f8820 fix #2982 2024-03-28 20:22:31 +08:00
hiyouga b19c14870d fix #3010 2024-03-28 18:31:17 +08:00
hiyouga 8c77b10912 update trainers 2024-03-28 18:16:27 +08:00
hoshi-hiyouga 3bcd41b639 fix ds optimizer 2024-03-26 23:39:56 +08:00
hiyouga 3164b4f11b fix bug 2024-03-26 17:30:12 +08:00
hiyouga 511f675402 fix #2961 2024-03-26 17:26:14 +08:00
hiyouga ba70aca8fb release v0.6.0 (real) 2024-03-25 23:37:48 +08:00
hiyouga 98a42cbdaa tiny fix 2024-03-25 23:28:52 +08:00
hiyouga 1484f76a95 add arg check 2024-03-25 22:42:58 +08:00
hiyouga 6f2b563f12 release v0.6.0 2024-03-25 22:38:56 +08:00
hiyouga 558a538724 tiny fix 2024-03-25 21:18:08 +08:00
marko1616 c8f0d99704 pass ruff check 2024-03-24 16:12:10 +08:00
marko1616 6f080fdba3 fix Llama lora merge crash 2024-03-24 03:06:11 +08:00
marko1616 51349ea1cc fix Llama lora merge crash 2024-03-24 02:55:23 +08:00
marko1616 c1e2c4ea45 fix Llama lora merge crash 2024-03-24 02:44:35 +08:00
hiyouga 140ad4ad56 fix #2936 2024-03-24 00:43:21 +08:00
hiyouga 7afbc85dae fix #2928 2024-03-24 00:34:54 +08:00
hiyouga a1c8c98c5f fix #2941 2024-03-24 00:28:44 +08:00
hiyouga 8408225162 support fsdp + qlora 2024-03-21 00:36:06 +08:00
hiyouga 9bec3c98a2 fix #2777 #2895 2024-03-20 17:59:45 +08:00
hiyouga 7b8f502901 fix #2346 2024-03-20 17:56:33 +08:00
hiyouga 8e04794b2d fix packages 2024-03-17 22:32:03 +08:00
hiyouga 85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
hoshi-hiyouga 113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga 6bc2c23b6d fix export 2024-03-15 15:06:30 +08:00
S3Studio e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga 6ebde4f23e tiny fix 2024-03-14 21:19:06 +08:00
hiyouga 3b4a59bfb1 fix export 2024-03-14 18:17:01 +08:00
hiyouga 8172530d54 fix bug 2024-03-13 23:55:31 +08:00
hiyouga 714d936dfb fix bug 2024-03-13 23:43:42 +08:00
hiyouga 72367307df improve lora+ impl. 2024-03-13 23:32:51 +08:00
齐保元 a0965cd62c [FEATURE]: ADD LORA+ ALGORITHM 2024-03-13 19:43:27 +08:00
hiyouga 0b4a5bf509 fix #2817 2024-03-13 12:42:03 +08:00
hiyouga b9f87cdc11 fix #2802 2024-03-13 12:33:45 +08:00
hiyouga 96ce76cd27 fix kv cache 2024-03-13 01:21:50 +08:00
hiyouga 19ef482649 support QDoRA 2024-03-12 22:12:42 +08:00
hiyouga 70a3052dd8 patch for gemma cpt 2024-03-12 21:21:54 +08:00
hiyouga 60cc17f3a8 fix plot issues 2024-03-12 18:41:35 +08:00
hiyouga b3247d6a16 support olmo 2024-03-12 18:30:38 +08:00
hiyouga 8d8956bad5 fix #2802 2024-03-12 17:08:34 +08:00
hiyouga 07f9b754a7 fix #2782 #2798 2024-03-12 15:53:29 +08:00
hiyouga e874c00906 fix #2775 2024-03-11 00:42:54 +08:00
hiyouga 352693e2dc tiny fix 2024-03-11 00:17:18 +08:00
hiyouga be99799413 update parser 2024-03-10 13:35:20 +08:00
hiyouga 8664262cde support layerwise galore 2024-03-10 00:24:11 +08:00
hiyouga 18ffce36b5 fix #2732 2024-03-09 22:37:16 +08:00
hiyouga bdb496644c allow non-packing pretraining 2024-03-09 22:21:46 +08:00
hiyouga 412c52e325 fix #2766 2024-03-09 21:35:24 +08:00
hiyouga af0e370fb1 use default arg for freeze tuning 2024-03-09 06:08:48 +08:00
hiyouga 393c2de27c update hardware requirements 2024-03-09 03:58:18 +08:00
hiyouga e8dd38b7fd fix #2756 , patch #2746 2024-03-09 02:01:26 +08:00
hoshi-hiyouga 516d0ddc66
Merge pull request #2746 from stephen-nju/main
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00