hiyouga
5c62881c5a
fix bug in galore optimizer
2024-04-21 18:53:22 +08:00
hiyouga
f58425ab45
fix mod stuff
2024-04-21 18:11:10 +08:00
hoshi-hiyouga
d0273787be
Merge pull request #3338 from astramind-ai/main
...
Adding Mixture of Depth
2024-04-21 18:05:52 +08:00
hoshi-hiyouga
1fa287fd63
fix #3348
2024-04-20 10:34:09 +08:00
hiyouga
ba559a659a
fix #3352
2024-04-19 22:40:01 +08:00
hiyouga
14a605a2da
fix llama3 template
2024-04-19 15:46:51 +08:00
Marco
4fb7e046b3
fix small typo
2024-04-18 20:33:29 +02:00
Marco
620add7b9f
Added Mixture of Depths
2024-04-18 20:31:24 +02:00
hoshi-hiyouga
2aaaede247
support llama3
2024-04-19 01:13:50 +08:00
hiyouga
942362d008
fix #3324
2024-04-18 15:34:45 +08:00
hiyouga
3b43a3b7c5
tiny fix
2024-04-18 00:22:17 +08:00
hiyouga
cab0598fd0
add mixtral 8x22B models
2024-04-17 23:35:59 +08:00
hiyouga
5f86053d75
add CodeQwen models
2024-04-17 23:27:22 +08:00
hiyouga
c9a477322d
fix #3316
2024-04-17 22:54:34 +08:00
hiyouga
6d641af703
fix #3317
2024-04-17 22:17:19 +08:00
hiyouga
278c5e10c4
lint
2024-04-16 18:21:09 +08:00
hoshi-hiyouga
aa3206ec26
Merge pull request #3291 from codemayq/main
...
support for previewing custom dataset in directory format
2024-04-16 18:12:09 +08:00
hiyouga
c00f0771a5
Update parser.py
2024-04-16 18:09:31 +08:00
hiyouga
5d62a51c12
update readme and gradio version
2024-04-16 18:09:16 +08:00
hiyouga
e3d8fc75eb
support badam for all stages
2024-04-16 17:44:48 +08:00
hoshi-hiyouga
4d660c5ade
Merge pull request #3287 from Ledzy/badam
...
[Feature] Add BAdam algorithm
2024-04-16 17:32:16 +08:00
hoshi-hiyouga
c9828f4c6e
Update utils.py
2024-04-16 17:30:12 +08:00
hoshi-hiyouga
6700a1b9fa
Update trainer.py
2024-04-16 17:29:52 +08:00
hoshi-hiyouga
38a56706e0
Update utils.py
2024-04-16 17:29:30 +08:00
hoshi-hiyouga
a950f3b81d
Update patcher.py
2024-04-16 17:29:19 +08:00
hoshi-hiyouga
750cdf2e74
Update adapter.py
2024-04-16 17:28:12 +08:00
hoshi-hiyouga
4660703674
Update parser.py
2024-04-16 17:27:25 +08:00
hoshi-hiyouga
5b59ff4212
Update parser.py
2024-04-16 17:27:02 +08:00
hoshi-hiyouga
ec899cccf3
Update finetuning_args.py
2024-04-16 17:26:30 +08:00
Jonery
7ecb61822b
resolve gradient checkpointing issue.
2024-04-16 12:05:27 +08:00
codingma
62294289dc
add check
2024-04-16 10:56:39 +08:00
codingma
75aa6392e8
support for previewing custom dataset in directory format
2024-04-16 10:43:14 +08:00
hiyouga
b3ac14ffc4
add empty template
2024-04-16 03:10:02 +08:00
hiyouga
7dc72fb58c
support unsloth 2024.4
2024-04-16 00:25:03 +08:00
hiyouga
6543f3d449
add codegemma
2024-04-16 00:11:15 +08:00
hiyouga
e0dbac2845
support cohere commandR #3184
2024-04-15 23:26:42 +08:00
Jonery
06c8908d3f
Feature BAdam
2024-04-15 23:15:27 +08:00
hoshi-hiyouga
7a8ae3f4ac
Merge pull request #3254 from marko1616/feature/Add-support-for-CohereForAI/c4ai-command-r-plus
...
Add template&support for c4ai-command-r/plus (tested)
2024-04-15 22:59:35 +08:00
hoshi-hiyouga
3ccf0d0977
Update template.py
2024-04-15 22:58:01 +08:00
hoshi-hiyouga
268f53dddb
Update constants.py
2024-04-15 22:56:55 +08:00
hiyouga
cce52351b5
update examples
2024-04-15 22:14:34 +08:00
marko1616
2c89b38720
change default_system accroding to official template
2024-04-15 20:45:46 +08:00
marko1616
90c5dddf9a
Revert "Add support for function call(Not strictly following origin)"
...
This reverts commit d7b9bbc8b9
.
2024-04-15 20:27:09 +08:00
marko1616
d7b9bbc8b9
Add support for function call(Not strictly following origin)
2024-04-15 20:16:52 +08:00
hoshi-hiyouga
0e0942d388
Merge pull request #3276 from liu-zichen/fix_mixtral
...
fix: turn on output_router_logits of mixtral
2024-04-15 15:38:16 +08:00
hiyouga
efc345c4b0
fix #3273
2024-04-15 15:32:58 +08:00
liuzc
9f4fe62386
fix: mixtral output_router_logits
2024-04-15 12:11:49 +08:00
marko1616
ab033dac4f
Typo fix
2024-04-13 17:30:21 +08:00
marko1616
42806323f0
Typo fix
2024-04-13 07:52:11 +08:00
marko1616
d0705518ee
Add c4ai-command-r-plus link
2024-04-13 07:32:40 +08:00
marko1616
6574a721d2
Add template&support(Not tested)
2024-04-13 04:31:33 +08:00
hiyouga
c53a11b6fd
fix model card
2024-04-12 17:11:59 +08:00
hiyouga
232642a621
fix #3238
2024-04-12 14:28:11 +08:00
hiyouga
3dfe4cf611
set dev version
2024-04-11 20:27:34 +08:00
hiyouga
9d4c949461
release v0.6.2
2024-04-11 20:08:51 +08:00
hiyouga
51d0a1a19e
Merge branch 'main' of https://github.com/hiyouga/LLaMA-Factory
2024-04-10 23:58:18 +08:00
hiyouga
a99f5ed0b6
fix #3225
2024-04-10 23:57:59 +08:00
hoshi-hiyouga
98bc97d8d2
Update adapter.py
2024-04-10 00:57:51 +08:00
hoshi-hiyouga
2111b586b6
Update adapter.py
2024-04-10 00:57:30 +08:00
Erich Schubert
b5eefe5c4c
Pass additional_target to unsloth
...
Fixes #3200
2024-04-09 17:53:40 +02:00
hiyouga
7f6c2486b8
fix quant infer and qwen2moe
2024-04-09 17:12:59 +08:00
hiyouga
9a99fbc86d
tiny fix
2024-04-08 21:28:39 +08:00
hoshi-hiyouga
4c6c4a0d88
Merge pull request #3161 from hiyouga/feature/add-mediatek-model
...
support Breeze-7B
2024-04-08 20:56:51 +08:00
codingma
7b76b4ca08
add empty line
2024-04-07 18:28:08 +08:00
codingma
34bdcba017
rename template to breeze
2024-04-07 18:27:20 +08:00
codingma
5a780e9eec
rename template to breeze
2024-04-07 11:39:54 +08:00
codingma
2565a32bd9
support https://github.com/hiyouga/LLaMA-Factory/issues/3152
2024-04-07 11:34:01 +08:00
sliderSun
1d117b7bb6
fix spell error
2024-04-07 10:59:15 +08:00
sliderSun
21650d467c
support Qwen1.5-32B
2024-04-07 10:56:03 +08:00
sliderSun
77044d9ef4
support Qwen1.5-32B
2024-04-07 10:26:13 +08:00
hiyouga
a6d943804b
tiny fix
2024-04-04 02:19:03 +08:00
hiyouga
4b920f24d3
back to gradio 4.21 and fix chat
2024-04-04 02:07:20 +08:00
hiyouga
5ddcecda50
fix bug in latest gradio
2024-04-04 00:55:31 +08:00
hiyouga
7f6e412604
fix requires for windows
2024-04-03 21:56:43 +08:00
hiyouga
148bda353f
fix resize vocab at inference #3022
2024-04-03 18:14:24 +08:00
hiyouga
ce77d98872
fix #3116
2024-04-03 14:47:59 +08:00
hiyouga
92dab8a90b
simplify readme
2024-04-02 20:07:43 +08:00
hiyouga
b267aeb53f
add moe aux loss control #3085
2024-04-02 14:26:31 +08:00
hiyouga
9ddbe2866a
fix #3022
2024-04-02 13:58:39 +08:00
hiyouga
dd73a0c248
set dev version
2024-04-01 23:24:08 +08:00
hiyouga
4a6ca621c0
fix #3083
2024-04-01 22:53:52 +08:00
hiyouga
54b7d34908
add qwen1.5 moe
2024-04-01 21:49:40 +08:00
hiyouga
aee634cd20
fix #3077
2024-04-01 21:35:18 +08:00
hiyouga
eb259cc573
support infer 4bit model on GPUs #3023
2024-04-01 17:34:04 +08:00
hiyouga
d0842f6828
update webui
2024-04-01 16:23:28 +08:00
hiyouga
816d714146
fix ORPO loss
2024-04-01 14:42:41 +08:00
hiyouga
5b9b40403d
fix IPO and ORPO loss
2024-04-01 14:37:53 +08:00
hiyouga
5907216a1c
fix plots
2024-03-31 19:43:48 +08:00
hiyouga
68aaa4904b
use log1p in orpo loss
...
https://github.com/huggingface/trl/pull/1491
2024-03-31 19:27:08 +08:00
hiyouga
099db6acc0
update readme
2024-03-31 18:46:34 +08:00
hiyouga
5195add324
support orpo in webui
2024-03-31 18:34:59 +08:00
hiyouga
17bf8a2c3a
support ORPO
2024-03-31 18:29:50 +08:00
hiyouga
27776c3474
tiny fix
2024-03-31 00:10:29 +08:00
marko1616
d9a5134617
fix blank line contains whitespace
2024-03-30 23:46:55 +08:00
marko1616
eb178eaff3
Fix Llama model save for full param train
2024-03-30 23:45:04 +08:00
hiyouga
7a086ed333
support save args in webui #2807 #3046
...
some ideas are borrowed from @marko1616
2024-03-30 23:09:12 +08:00
hiyouga
831c5321ac
upgrade gradio to 4.21.0
2024-03-30 20:37:08 +08:00
hiyouga
ca793028c6
release v0.6.1
2024-03-29 11:36:08 +08:00
hiyouga
8d603f8820
fix #2982
2024-03-28 20:22:31 +08:00
hiyouga
b19c14870d
fix #3010
2024-03-28 18:31:17 +08:00
hiyouga
8c77b10912
update trainers
2024-03-28 18:16:27 +08:00
hoshi-hiyouga
3bcd41b639
fix ds optimizer
2024-03-26 23:39:56 +08:00
hiyouga
3164b4f11b
fix bug
2024-03-26 17:30:12 +08:00
hiyouga
511f675402
fix #2961
2024-03-26 17:26:14 +08:00
hiyouga
ba70aca8fb
release v0.6.0 (real)
2024-03-25 23:37:48 +08:00
hiyouga
98a42cbdaa
tiny fix
2024-03-25 23:28:52 +08:00
hiyouga
1484f76a95
add arg check
2024-03-25 22:42:58 +08:00
hiyouga
6f2b563f12
release v0.6.0
2024-03-25 22:38:56 +08:00
hiyouga
558a538724
tiny fix
2024-03-25 21:18:08 +08:00
marko1616
c8f0d99704
pass ruff check
2024-03-24 16:12:10 +08:00
marko1616
6f080fdba3
fix Llama lora merge crash
2024-03-24 03:06:11 +08:00
marko1616
51349ea1cc
fix Llama lora merge crash
2024-03-24 02:55:23 +08:00
marko1616
c1e2c4ea45
fix Llama lora merge crash
2024-03-24 02:44:35 +08:00
hiyouga
140ad4ad56
fix #2936
2024-03-24 00:43:21 +08:00
hiyouga
7afbc85dae
fix #2928
2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
9bec3c98a2
fix #2777 #2895
2024-03-20 17:59:45 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
hiyouga
8e04794b2d
fix packages
2024-03-17 22:32:03 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d
fix export
2024-03-15 15:06:30 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
hiyouga
6ebde4f23e
tiny fix
2024-03-14 21:19:06 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
8172530d54
fix bug
2024-03-13 23:55:31 +08:00
hiyouga
714d936dfb
fix bug
2024-03-13 23:43:42 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
齐保元
a0965cd62c
[FEATURE]: ADD LORA+ ALGORITHM
2024-03-13 19:43:27 +08:00
hiyouga
0b4a5bf509
fix #2817
2024-03-13 12:42:03 +08:00
hiyouga
b9f87cdc11
fix #2802
2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
19ef482649
support QDoRA
2024-03-12 22:12:42 +08:00
hiyouga
70a3052dd8
patch for gemma cpt
2024-03-12 21:21:54 +08:00
hiyouga
60cc17f3a8
fix plot issues
2024-03-12 18:41:35 +08:00
hiyouga
b3247d6a16
support olmo
2024-03-12 18:30:38 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
07f9b754a7
fix #2782 #2798
2024-03-12 15:53:29 +08:00
hiyouga
e874c00906
fix #2775
2024-03-11 00:42:54 +08:00
hiyouga
352693e2dc
tiny fix
2024-03-11 00:17:18 +08:00
hiyouga
be99799413
update parser
2024-03-10 13:35:20 +08:00
hiyouga
8664262cde
support layerwise galore
2024-03-10 00:24:11 +08:00
hiyouga
18ffce36b5
fix #2732
2024-03-09 22:37:16 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
412c52e325
fix #2766
2024-03-09 21:35:24 +08:00
hiyouga
af0e370fb1
use default arg for freeze tuning
2024-03-09 06:08:48 +08:00
hiyouga
393c2de27c
update hardware requirements
2024-03-09 03:58:18 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00