LLaMA-Factory-Mirror

Commit Graph

Author	SHA1	Message	Date
hiyouga	5c62881c5a	fix bug in galore optimizer	2024-04-21 18:53:22 +08:00
hiyouga	f58425ab45	fix mod stuff	2024-04-21 18:11:10 +08:00
hoshi-hiyouga	d0273787be	Merge pull request #3338 from astramind-ai/main Adding Mixture of Depth	2024-04-21 18:05:52 +08:00
hoshi-hiyouga	1fa287fd63	fix #3348	2024-04-20 10:34:09 +08:00
hiyouga	ba559a659a	fix #3352	2024-04-19 22:40:01 +08:00
hiyouga	14a605a2da	fix llama3 template	2024-04-19 15:46:51 +08:00
Marco	4fb7e046b3	fix small typo	2024-04-18 20:33:29 +02:00
Marco	620add7b9f	Added Mixture of Depths	2024-04-18 20:31:24 +02:00
hoshi-hiyouga	2aaaede247	support llama3	2024-04-19 01:13:50 +08:00
hiyouga	942362d008	fix #3324	2024-04-18 15:34:45 +08:00
hiyouga	3b43a3b7c5	tiny fix	2024-04-18 00:22:17 +08:00
hiyouga	cab0598fd0	add mixtral 8x22B models	2024-04-17 23:35:59 +08:00
hiyouga	5f86053d75	add CodeQwen models	2024-04-17 23:27:22 +08:00
hiyouga	c9a477322d	fix #3316	2024-04-17 22:54:34 +08:00
hiyouga	6d641af703	fix #3317	2024-04-17 22:17:19 +08:00
hiyouga	278c5e10c4	lint	2024-04-16 18:21:09 +08:00
hoshi-hiyouga	aa3206ec26	Merge pull request #3291 from codemayq/main support for previewing custom dataset in directory format	2024-04-16 18:12:09 +08:00
hiyouga	c00f0771a5	Update parser.py	2024-04-16 18:09:31 +08:00
hiyouga	5d62a51c12	update readme and gradio version	2024-04-16 18:09:16 +08:00
hiyouga	e3d8fc75eb	support badam for all stages	2024-04-16 17:44:48 +08:00
hoshi-hiyouga	4d660c5ade	Merge pull request #3287 from Ledzy/badam [Feature] Add BAdam algorithm	2024-04-16 17:32:16 +08:00
hoshi-hiyouga	c9828f4c6e	Update utils.py	2024-04-16 17:30:12 +08:00
hoshi-hiyouga	6700a1b9fa	Update trainer.py	2024-04-16 17:29:52 +08:00
hoshi-hiyouga	38a56706e0	Update utils.py	2024-04-16 17:29:30 +08:00
hoshi-hiyouga	a950f3b81d	Update patcher.py	2024-04-16 17:29:19 +08:00
hoshi-hiyouga	750cdf2e74	Update adapter.py	2024-04-16 17:28:12 +08:00
hoshi-hiyouga	4660703674	Update parser.py	2024-04-16 17:27:25 +08:00
hoshi-hiyouga	5b59ff4212	Update parser.py	2024-04-16 17:27:02 +08:00
hoshi-hiyouga	ec899cccf3	Update finetuning_args.py	2024-04-16 17:26:30 +08:00
Jonery	7ecb61822b	resolve gradient checkpointing issue.	2024-04-16 12:05:27 +08:00
codingma	62294289dc	add check	2024-04-16 10:56:39 +08:00
codingma	75aa6392e8	support for previewing custom dataset in directory format	2024-04-16 10:43:14 +08:00
hiyouga	b3ac14ffc4	add empty template	2024-04-16 03:10:02 +08:00
hiyouga	7dc72fb58c	support unsloth 2024.4	2024-04-16 00:25:03 +08:00
hiyouga	6543f3d449	add codegemma	2024-04-16 00:11:15 +08:00
hiyouga	e0dbac2845	support cohere commandR #3184	2024-04-15 23:26:42 +08:00
Jonery	06c8908d3f	Feature BAdam	2024-04-15 23:15:27 +08:00
hoshi-hiyouga	7a8ae3f4ac	Merge pull request #3254 from marko1616/feature/Add-support-for-CohereForAI/c4ai-command-r-plus Add template&support for c4ai-command-r/plus (tested)	2024-04-15 22:59:35 +08:00
hoshi-hiyouga	3ccf0d0977	Update template.py	2024-04-15 22:58:01 +08:00
hoshi-hiyouga	268f53dddb	Update constants.py	2024-04-15 22:56:55 +08:00
hiyouga	cce52351b5	update examples	2024-04-15 22:14:34 +08:00
marko1616	2c89b38720	change default_system accroding to official template	2024-04-15 20:45:46 +08:00
marko1616	90c5dddf9a	Revert "Add support for function call(Not strictly following origin)" This reverts commit `d7b9bbc8b9`.	2024-04-15 20:27:09 +08:00
marko1616	d7b9bbc8b9	Add support for function call(Not strictly following origin)	2024-04-15 20:16:52 +08:00
hoshi-hiyouga	0e0942d388	Merge pull request #3276 from liu-zichen/fix_mixtral fix: turn on output_router_logits of mixtral	2024-04-15 15:38:16 +08:00
hiyouga	efc345c4b0	fix #3273	2024-04-15 15:32:58 +08:00
liuzc	9f4fe62386	fix: mixtral output_router_logits	2024-04-15 12:11:49 +08:00
marko1616	ab033dac4f	Typo fix	2024-04-13 17:30:21 +08:00
marko1616	42806323f0	Typo fix	2024-04-13 07:52:11 +08:00
marko1616	d0705518ee	Add c4ai-command-r-plus link	2024-04-13 07:32:40 +08:00
marko1616	6574a721d2	Add template&support(Not tested)	2024-04-13 04:31:33 +08:00
hiyouga	c53a11b6fd	fix model card	2024-04-12 17:11:59 +08:00
hiyouga	232642a621	fix #3238	2024-04-12 14:28:11 +08:00
hiyouga	3dfe4cf611	set dev version	2024-04-11 20:27:34 +08:00
hiyouga	9d4c949461	release v0.6.2	2024-04-11 20:08:51 +08:00
hiyouga	51d0a1a19e	Merge branch 'main' of https://github.com/hiyouga/LLaMA-Factory	2024-04-10 23:58:18 +08:00
hiyouga	a99f5ed0b6	fix #3225	2024-04-10 23:57:59 +08:00
hoshi-hiyouga	98bc97d8d2	Update adapter.py	2024-04-10 00:57:51 +08:00
hoshi-hiyouga	2111b586b6	Update adapter.py	2024-04-10 00:57:30 +08:00
Erich Schubert	b5eefe5c4c	Pass additional_target to unsloth Fixes #3200	2024-04-09 17:53:40 +02:00
hiyouga	7f6c2486b8	fix quant infer and qwen2moe	2024-04-09 17:12:59 +08:00
hiyouga	9a99fbc86d	tiny fix	2024-04-08 21:28:39 +08:00
hoshi-hiyouga	4c6c4a0d88	Merge pull request #3161 from hiyouga/feature/add-mediatek-model support Breeze-7B	2024-04-08 20:56:51 +08:00
codingma	7b76b4ca08	add empty line	2024-04-07 18:28:08 +08:00
codingma	34bdcba017	rename template to breeze	2024-04-07 18:27:20 +08:00
codingma	5a780e9eec	rename template to breeze	2024-04-07 11:39:54 +08:00
codingma	2565a32bd9	support https://github.com/hiyouga/LLaMA-Factory/issues/3152	2024-04-07 11:34:01 +08:00
sliderSun	1d117b7bb6	fix spell error	2024-04-07 10:59:15 +08:00
sliderSun	21650d467c	support Qwen1.5-32B	2024-04-07 10:56:03 +08:00
sliderSun	77044d9ef4	support Qwen1.5-32B	2024-04-07 10:26:13 +08:00
hiyouga	a6d943804b	tiny fix	2024-04-04 02:19:03 +08:00
hiyouga	4b920f24d3	back to gradio 4.21 and fix chat	2024-04-04 02:07:20 +08:00
hiyouga	5ddcecda50	fix bug in latest gradio	2024-04-04 00:55:31 +08:00
hiyouga	7f6e412604	fix requires for windows	2024-04-03 21:56:43 +08:00
hiyouga	148bda353f	fix resize vocab at inference #3022	2024-04-03 18:14:24 +08:00
hiyouga	ce77d98872	fix #3116	2024-04-03 14:47:59 +08:00
hiyouga	92dab8a90b	simplify readme	2024-04-02 20:07:43 +08:00
hiyouga	b267aeb53f	add moe aux loss control #3085	2024-04-02 14:26:31 +08:00
hiyouga	9ddbe2866a	fix #3022	2024-04-02 13:58:39 +08:00
hiyouga	dd73a0c248	set dev version	2024-04-01 23:24:08 +08:00
hiyouga	4a6ca621c0	fix #3083	2024-04-01 22:53:52 +08:00
hiyouga	54b7d34908	add qwen1.5 moe	2024-04-01 21:49:40 +08:00
hiyouga	aee634cd20	fix #3077	2024-04-01 21:35:18 +08:00
hiyouga	eb259cc573	support infer 4bit model on GPUs #3023	2024-04-01 17:34:04 +08:00
hiyouga	d0842f6828	update webui	2024-04-01 16:23:28 +08:00
hiyouga	816d714146	fix ORPO loss	2024-04-01 14:42:41 +08:00
hiyouga	5b9b40403d	fix IPO and ORPO loss	2024-04-01 14:37:53 +08:00
hiyouga	5907216a1c	fix plots	2024-03-31 19:43:48 +08:00
hiyouga	68aaa4904b	use log1p in orpo loss https://github.com/huggingface/trl/pull/1491	2024-03-31 19:27:08 +08:00
hiyouga	099db6acc0	update readme	2024-03-31 18:46:34 +08:00
hiyouga	5195add324	support orpo in webui	2024-03-31 18:34:59 +08:00
hiyouga	17bf8a2c3a	support ORPO	2024-03-31 18:29:50 +08:00
hiyouga	27776c3474	tiny fix	2024-03-31 00:10:29 +08:00
marko1616	d9a5134617	fix blank line contains whitespace	2024-03-30 23:46:55 +08:00
marko1616	eb178eaff3	Fix Llama model save for full param train	2024-03-30 23:45:04 +08:00
hiyouga	7a086ed333	support save args in webui #2807 #3046 some ideas are borrowed from @marko1616	2024-03-30 23:09:12 +08:00
hiyouga	831c5321ac	upgrade gradio to 4.21.0	2024-03-30 20:37:08 +08:00
hiyouga	ca793028c6	release v0.6.1	2024-03-29 11:36:08 +08:00
hiyouga	8d603f8820	fix #2982	2024-03-28 20:22:31 +08:00
hiyouga	b19c14870d	fix #3010	2024-03-28 18:31:17 +08:00
hiyouga	8c77b10912	update trainers	2024-03-28 18:16:27 +08:00
hoshi-hiyouga	3bcd41b639	fix ds optimizer	2024-03-26 23:39:56 +08:00
hiyouga	3164b4f11b	fix bug	2024-03-26 17:30:12 +08:00
hiyouga	511f675402	fix #2961	2024-03-26 17:26:14 +08:00
hiyouga	ba70aca8fb	release v0.6.0 (real)	2024-03-25 23:37:48 +08:00
hiyouga	98a42cbdaa	tiny fix	2024-03-25 23:28:52 +08:00
hiyouga	1484f76a95	add arg check	2024-03-25 22:42:58 +08:00
hiyouga	6f2b563f12	release v0.6.0	2024-03-25 22:38:56 +08:00
hiyouga	558a538724	tiny fix	2024-03-25 21:18:08 +08:00
marko1616	c8f0d99704	pass ruff check	2024-03-24 16:12:10 +08:00
marko1616	6f080fdba3	fix Llama lora merge crash	2024-03-24 03:06:11 +08:00
marko1616	51349ea1cc	fix Llama lora merge crash	2024-03-24 02:55:23 +08:00
marko1616	c1e2c4ea45	fix Llama lora merge crash	2024-03-24 02:44:35 +08:00
hiyouga	140ad4ad56	fix #2936	2024-03-24 00:43:21 +08:00
hiyouga	7afbc85dae	fix #2928	2024-03-24 00:34:54 +08:00
hiyouga	a1c8c98c5f	fix #2941	2024-03-24 00:28:44 +08:00
hiyouga	8408225162	support fsdp + qlora	2024-03-21 00:36:06 +08:00
hiyouga	9bec3c98a2	fix #2777 #2895	2024-03-20 17:59:45 +08:00
hiyouga	7b8f502901	fix #2346	2024-03-20 17:56:33 +08:00
hiyouga	8e04794b2d	fix packages	2024-03-17 22:32:03 +08:00
hiyouga	85c376fc1e	fix patcher	2024-03-15 19:18:42 +08:00
hoshi-hiyouga	113cc04719	Merge pull request #2849 from S3Studio/DockerizeSupport Improve Dockerize support	2024-03-15 19:16:02 +08:00
hiyouga	6bc2c23b6d	fix export	2024-03-15 15:06:30 +08:00
S3Studio	e75407febd	Use official Nvidia base image Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.	2024-03-15 08:59:13 +08:00
hiyouga	6ebde4f23e	tiny fix	2024-03-14 21:19:06 +08:00
hiyouga	3b4a59bfb1	fix export	2024-03-14 18:17:01 +08:00
hiyouga	8172530d54	fix bug	2024-03-13 23:55:31 +08:00
hiyouga	714d936dfb	fix bug	2024-03-13 23:43:42 +08:00
hiyouga	72367307df	improve lora+ impl.	2024-03-13 23:32:51 +08:00
齐保元	a0965cd62c	[FEATURE]: ADD LORA+ ALGORITHM	2024-03-13 19:43:27 +08:00
hiyouga	0b4a5bf509	fix #2817	2024-03-13 12:42:03 +08:00
hiyouga	b9f87cdc11	fix #2802	2024-03-13 12:33:45 +08:00
hiyouga	96ce76cd27	fix kv cache	2024-03-13 01:21:50 +08:00
hiyouga	19ef482649	support QDoRA	2024-03-12 22:12:42 +08:00
hiyouga	70a3052dd8	patch for gemma cpt	2024-03-12 21:21:54 +08:00
hiyouga	60cc17f3a8	fix plot issues	2024-03-12 18:41:35 +08:00
hiyouga	b3247d6a16	support olmo	2024-03-12 18:30:38 +08:00
hiyouga	8d8956bad5	fix #2802	2024-03-12 17:08:34 +08:00
hiyouga	07f9b754a7	fix #2782 #2798	2024-03-12 15:53:29 +08:00
hiyouga	e874c00906	fix #2775	2024-03-11 00:42:54 +08:00
hiyouga	352693e2dc	tiny fix	2024-03-11 00:17:18 +08:00
hiyouga	be99799413	update parser	2024-03-10 13:35:20 +08:00
hiyouga	8664262cde	support layerwise galore	2024-03-10 00:24:11 +08:00
hiyouga	18ffce36b5	fix #2732	2024-03-09 22:37:16 +08:00
hiyouga	bdb496644c	allow non-packing pretraining	2024-03-09 22:21:46 +08:00
hiyouga	412c52e325	fix #2766	2024-03-09 21:35:24 +08:00
hiyouga	af0e370fb1	use default arg for freeze tuning	2024-03-09 06:08:48 +08:00
hiyouga	393c2de27c	update hardware requirements	2024-03-09 03:58:18 +08:00
hiyouga	e8dd38b7fd	fix #2756 , patch #2746	2024-03-09 02:01:26 +08:00
hoshi-hiyouga	516d0ddc66	Merge pull request #2746 from stephen-nju/main fix deepspeed ppo RuntimeError	2024-03-09 01:37:00 +08:00

1 2 3 4 5 ...

965 Commits