Commit Graph

1401 Commits

Author SHA1 Message Date
hiyouga e43809bced fix #4683 2024-07-05 00:58:05 +08:00
hiyouga ed232311e8 fix #4674 2024-07-05 00:41:03 +08:00
hiyouga 226a9e563f Merge branch 'main' of https://github.com/hiyouga/LLaMA-Factory 2024-07-04 14:23:37 +08:00
hiyouga 1e27e8c776 fix #4677 2024-07-04 14:22:07 +08:00
hzhaoy 738df47748 tiny fix 2024-07-04 10:20:28 +08:00
hiyouga 0c699de39d tiny fix 2024-07-04 03:47:05 +08:00
hiyouga 44747cebd2 tiny fix 2024-07-04 03:02:23 +08:00
hiyouga b5d101e1bf fix data map for packing 2024-07-04 03:01:31 +08:00
hiyouga 6fd6aa4530 fix packing for eager/sdpa attn 2024-07-04 01:52:43 +08:00
hoshi-hiyouga 87d9b2d005
Merge pull request #4224 from chuan298/main
Implement efficient packing without cross-contamination attention
2024-07-04 01:18:54 +08:00
hiyouga cce7083024 update packing 2024-07-04 01:10:55 +08:00
hoshi-hiyouga a36e8f2dd5
Update packing.py 2024-07-03 23:36:01 +08:00
hiyouga c346f79f99 update func name 2024-07-03 23:29:33 +08:00
hiyouga 8a6a7b9c8a update arg name 2024-07-03 23:23:24 +08:00
hiyouga 575a02a23d update hparams 2024-07-03 23:18:58 +08:00
hiyouga 7f770f6895 update ui 2024-07-03 23:13:49 +08:00
hiyouga 8845e94f91 fix #4609
unwrap_model_for_generation(reward_model) is necessary for zero3 training
2024-07-03 19:45:51 +08:00
hiyouga 8b1172b910 tiny fix 2024-07-03 02:31:50 +08:00
hiyouga 71cdf8956e tiny fix 2024-07-02 23:06:13 +08:00
hiyouga 821bb6660e remove rlhf support for chatglm2&3 2024-07-02 23:03:17 +08:00
hiyouga c13ae2df19 upcast logits 2024-07-02 22:32:05 +08:00
hiyouga c47ab6c072 improve rlhf 2024-07-02 22:23:08 +08:00
ancv e8e13b0942 move efficient_packing from data_args to model_args 2024-07-02 18:37:55 +07:00
hoshi-hiyouga 4e4b3cc905
Merge pull request #4651 from hzhaoy/add-telechat-1b
Add TeleChat-1B
2024-07-02 17:56:43 +08:00
hzhaoy 57b7c00430 add TeleChat-1B 2024-07-02 17:49:04 +08:00
hiyouga 4c296001c4 fix ppo callbacks 2024-07-02 17:34:56 +08:00
hoshi-hiyouga e8e6af2651
Merge branch 'main' into main 2024-07-01 21:01:09 +08:00
hiyouga 73280b7dc7 tiny fix 2024-07-01 05:43:17 +08:00
hiyouga 8c41a0aa6d tiny fix 2024-07-01 03:55:20 +08:00
hiyouga 1856a08e87 add eval acc 2024-07-01 03:51:20 +08:00
hiyouga 1771251ce3 fix #4402 #4617
Deprecate reserved_label_len arg
2024-07-01 01:19:27 +08:00
hiyouga d74244d568 fix #4398 #4592 2024-06-30 21:28:51 +08:00
hiyouga 2f4b89ace1 loose gemma2 attention 2024-06-29 01:42:14 +08:00
hiyouga 4d35e218b1 bf16 by default, gemma2 attns
Gemma2 finetuning cannot work until merging https://github.com/huggingface/transformers/pull/31674
2024-06-28 06:00:26 +08:00
hiyouga 64f4337dac increase pissa_iter for stability 2024-06-28 03:18:54 +08:00
hiyouga 6f63050e1b add Gemma2 models 2024-06-28 01:26:50 +08:00
hiyouga 8baf3b22b0 refactor pissa, improve llamaboard 2024-06-28 01:04:24 +08:00
hoshi-hiyouga ef38daa0a4
Merge pull request #4580 from hzhaoy/bugfix-deepspeed-pissa
Fix bug when using pissa method with deepspeed
2024-06-28 00:46:51 +08:00
hiyouga 8ed6b367e2 fix #4549 2024-06-28 00:41:58 +08:00
hiyouga e44a4f07f0 tiny fix 2024-06-27 20:14:48 +08:00
faddddeout f6b62f0070 Exit the process with the subprocess's return code when utilizing the CLI 2024-06-27 09:58:00 +00:00
hzhaoy 677c86594e fix #4579 2024-06-27 13:49:57 +08:00
hiyouga 96a5044394 add quant checks 2024-06-27 01:12:25 +08:00
hiyouga f17c9dfd84 tiny fix 2024-06-27 00:46:41 +08:00
hiyouga 29c710da3a tiny fix 2024-06-27 00:36:04 +08:00
hiyouga ad144c2265 support HQQ/EETQ #4113 2024-06-27 00:29:42 +08:00
hiyouga addca926de improve autogptq integration 2024-06-26 22:11:44 +08:00
hiyouga 8d6cd69ac4 fix #4458 2024-06-26 19:52:35 +08:00
hiyouga 59e0b4f616 fix #4556 2024-06-26 19:43:16 +08:00
hiyouga 555ca8d780 lint 2024-06-25 02:55:50 +08:00
hiyouga 1e9d0aa1e4 fix #4432 2024-06-25 02:34:04 +08:00
hiyouga cc016461e6 fix #4379 2024-06-25 02:31:44 +08:00
hiyouga 095fab58d3 tiny fix about badam 2024-06-25 01:54:53 +08:00
hoshi-hiyouga d0f953bf5b
Merge pull request #4352 from Ledzy/main
[Enhancement] Support ZeRO-3 when using BAdam
2024-06-25 01:49:13 +08:00
hiyouga 41086059b1 tiny fix 2024-06-25 01:15:19 +08:00
hoshi-hiyouga 3bed18c644
Merge pull request #4409 from kno10/patch-2
Print help if no arguments given
2024-06-24 23:21:31 +08:00
hoshi-hiyouga acb61f7ab7
Update cli.py 2024-06-24 23:21:10 +08:00
hoshi-hiyouga def6d280db
Merge pull request #4417 from mMrBun/main
Add tool_format parameter to rewrite templates for different function call formats.
2024-06-24 23:17:55 +08:00
hoshi-hiyouga 1240bd57d8
Update template.py 2024-06-24 23:12:59 +08:00
hoshi-hiyouga dddfd516ee
Update loader.py 2024-06-24 23:06:18 +08:00
hiyouga fca893d73c fix #4410 2024-06-24 22:34:31 +08:00
hoshi-hiyouga cc452c32c7
Merge pull request #4446 from stceum/bug-fix
Bug Fix: `off` is parsed as `False` in yaml file
2024-06-24 21:41:28 +08:00
hoshi-hiyouga e90c424f55
Update parser.py 2024-06-24 21:37:42 +08:00
stceum 3ed063f281 Bug Fix: `off` is parsed as `False` in yaml file, changed to `disabled` to avoid this. 2024-06-24 20:39:31 +08:00
hiyouga e507e60638 update readme 2024-06-24 18:22:12 +08:00
mMrBun 20e2e6fdcb Add tool_format to overwrite tool formatter template 2024-06-22 02:13:23 +08:00
hiyouga db9a1912e3 remove dup template 2024-06-22 01:31:32 +08:00
hiyouga 3ce44dda99 fix api 2024-06-22 00:00:38 +08:00
Erich Schubert 7d70ba7fb8
Print help if no arguments given 2024-06-21 09:14:21 +02:00
ancv 770f75dc83 move configure_packing to llamafactory.model.patcher and fix constants 2024-06-21 00:45:06 +07:00
hiyouga 8d4f5093cf tiny fix 2024-06-20 22:56:05 +08:00
hiyouga f22d8f9ca4 improve llamaboard 2024-06-19 23:46:03 +08:00
hiyouga 3f84411b5d fix llamaboard abort 2024-06-19 23:22:28 +08:00
hiyouga 3b040e8e0f update patcher 2024-06-19 21:27:00 +08:00
hiyouga 42e69a3c63 set dev version 2024-06-19 21:08:16 +08:00
hiyouga 71327ba85a release v0.8.2 2024-06-19 20:42:09 +08:00
hiyouga 2b596fb55f fix jinja template 2024-06-19 20:03:50 +08:00
hiyouga 4cff6a4ad5 fix templates 2024-06-19 17:44:05 +08:00
Jonery 5c2ff1b749 Cleaner integration. 2024-06-19 12:29:40 +08:00
hiyouga 6d2bf216ac fix bug 2024-06-19 03:49:23 +08:00
hiyouga 4f22eae8f4 use prefix to replace force system 2024-06-19 03:39:52 +08:00
hiyouga cd75b1fe9d fix tool formatter, allow parallel function #4362 2024-06-19 03:23:51 +08:00
hoshi-hiyouga c0ca42566c
Merge pull request #4173 from mMrBun/main
Implemented the tool_formatter and tool_extractor for glm4 and Qwen2 tool_format
2024-06-19 03:18:55 +08:00
hiyouga a233fbc258 add deepseek coder v2 #4346 2024-06-18 22:53:54 +08:00
hiyouga 4bd77d8563 fix #4357 2024-06-18 22:42:45 +08:00
hiyouga c96264bc47 fix #4335 2024-06-18 22:08:56 +08:00
Jonery 8f7c78b641 fix typo 2024-06-18 12:39:26 +08:00
Jonery 0f72aac8c9 Support distributed BAdam. 2024-06-18 12:27:47 +08:00
hiyouga 24c160df3d lint 2024-06-17 22:35:56 +08:00
hiyouga 7857c0990b update chat engine #4335 2024-06-17 19:07:17 +08:00
Jonery ea1f3ba5e0 Merge remote-tracking branch 'upstream/main' 2024-06-17 18:44:51 +08:00
Jonery 33b4372778 adapt for badam with ds zero3 2024-06-17 18:18:10 +08:00
hiyouga e2665e71c7 fix #4326 2024-06-17 18:17:48 +08:00
ancv 238f5c3d99 update packing with sdpa and eager attention mode 2024-06-16 02:25:47 +07:00
hoshi-hiyouga 29c1f31baa
Update parser.py 2024-06-16 02:57:00 +08:00
hiyouga 46093b5786 fix tol 2024-06-16 01:38:44 +08:00
hiyouga 8c1046d78a support pissa 2024-06-16 01:08:12 +08:00
hiyouga 38b6b0f52e tiny fix 2024-06-16 01:06:41 +08:00
ancv 04315c3d92 remove some unused params 2024-06-15 23:00:55 +07:00
hiyouga 80a9e6bf94 use fixture 2024-06-15 20:06:17 +08:00
hiyouga 1b834f50be add tests 2024-06-15 19:51:20 +08:00
hiyouga 572d8bbfdd add minicpm #4227 2024-06-15 17:58:52 +08:00
hiyouga d87108daa6 add license 2024-06-15 17:54:33 +08:00
hiyouga d519b4d76d disable DP 2024-06-15 04:57:19 +08:00
hiyouga 9092f963db fix #4292 2024-06-15 04:47:13 +08:00
hiyouga 78589cf90c fix #4295 2024-06-15 04:34:55 +08:00
hiyouga b27269bd2b add test cases 2024-06-15 04:05:54 +08:00
hiyouga c94e6c9411 add quant check in webui export tab 2024-06-13 03:19:18 +08:00
hiyouga 6baafd4eb3 fix #4221 2024-06-13 02:48:21 +08:00
hiyouga cf9f2d6c42 fix #4209
DeepSpeed ZeRO3 has inflight param error when calling model.eval()
2024-06-13 02:25:50 +08:00
hiyouga 2ed8270112 clean code 2024-06-13 01:58:16 +08:00
hoshi-hiyouga 1f23f25226
Merge pull request #4246 from hzhaoy/adapt-vllm-v0.5.0
adapt vllm==0.5.0
2024-06-13 01:54:02 +08:00
hiyouga 713fde4259 fix lint 2024-06-13 00:48:44 +08:00
hzhaoy 8fb6366ebe adapt vllm==0.5.0 2024-06-12 18:29:03 +08:00
hiyouga 577de2fa07 fix #4242 2024-06-12 16:50:11 +08:00
Arthur Kim d65a3f7cb6
Support vllm==0.5.0 2024-06-12 16:49:12 +09:00
ancv b2c367bc61 implement efficient packing without cross-contamination attention 2024-06-12 11:56:01 +07:00
hoshi-hiyouga 9049aab911
Merge pull request #4204 from dignfei/main
fixbug:llama3在增量预训练时应该使用<|end_of_text|>标识文本的结束
2024-06-11 17:06:10 +08:00
hoshi-hiyouga 0c29233237
Update pretrain.py 2024-06-11 17:02:14 +08:00
hiyouga cca6f35108 fix deepspeed version 2024-06-11 16:52:36 +08:00
d 6979f3f848 经过大量的增量预训练,进行对比试验,发现这个bug:llama3在预训练时使用的tokenizer.eos_toke是'<|end_of_text|>' ,这里在每条数据后面也得用这个,而不是'<|eot_id|>',否则很容易导致严重的性能下降 2024-06-11 16:23:40 +08:00
hiyouga 89f2bd8c8c fix #4198 2024-06-11 15:38:38 +08:00
hiyouga 90e14a960d tiny fix 2024-06-11 12:48:53 +08:00
hiyouga 3f24337a8a tiny fix 2024-06-11 01:04:16 +08:00
hiyouga 91e62a098f set dev version 2024-06-11 00:50:53 +08:00
hiyouga 2b6ebd6b51 release v0.8.1 2024-06-11 00:44:26 +08:00
hiyouga a793e8456b fix #4160
The split heads should be concatenated in dim=2
2024-06-11 00:37:17 +08:00
hiyouga 0012762b04 update evaluator 2024-06-10 23:56:00 +08:00
hiyouga c907d81667 fix #2666 2024-06-10 21:24:15 +08:00
mMrBun 950e360ca0 Optimize the handling of QWEN2 in scenarios involving multiple tool calls. 2024-06-10 02:00:14 +08:00
mMrBun 6ed0b0c800 Removed unnecessary comments. 2024-06-09 18:25:22 +08:00
mMrBun 0f2609ce19
Merge branch 'hiyouga:main' into main 2024-06-09 18:17:24 +08:00
mMrBun cb1cbcb293 Implemented the tool_formatter and tool_extractor for glm4 tool_format 2024-06-09 18:16:15 +08:00
hiyouga 972ec9c668 fix llamafactory-cli env 2024-06-08 07:15:45 +08:00
hiyouga 3ac11e77cc set dev version 2024-06-08 06:46:09 +08:00
hiyouga 5aa4ce4756 release v0.8.0 2024-06-08 05:20:54 +08:00
hiyouga 54cd743ebf reorganize adapter code 2024-06-08 00:47:23 +08:00
hoshi-hiyouga cfd62283a9
fix #4139 2024-06-08 00:45:02 +08:00
hiyouga 06e5d136a4 add resume args in webui 2024-06-08 00:22:16 +08:00
hiyouga 8bf9da659c fix #4137 2024-06-07 19:16:06 +08:00
hiyouga f8d8690bf4 tiny fix 2024-06-07 05:19:21 +08:00
hiyouga 4489d73ac7 fix ppo trainer save zero3 model
accelerator.get_state_dict(ds_model) should be called at all ranks
2024-06-07 05:14:19 +08:00
hiyouga 2702d7e952 fix ppo in trl 0.8.6 2024-06-07 04:48:29 +08:00
hiyouga f9e818d79c fix #4120 2024-06-07 04:18:05 +08:00
hiyouga ccc8b64cc2 update data processors 2024-06-07 04:15:40 +08:00
hoshi-hiyouga 181dbb0d05
Merge pull request #4009 from AlongWY/main
supervised packing with greedy knapsack algorithm
2024-06-07 03:48:46 +08:00
hoshi-hiyouga c09ad8bab3
Update supervised.py 2024-06-07 03:42:08 +08:00
hoshi-hiyouga 788e8232fc
Update supervised.py 2024-06-07 03:38:23 +08:00
hoshi-hiyouga 8cecade708
Update supervised.py 2024-06-07 03:38:04 +08:00
hiyouga 8e95648850 add qwen2 models 2024-06-07 00:22:57 +08:00