Commit Graph

67 Commits

Author SHA1 Message Date
hiyouga 5ee04d418c update readme 2024-04-26 23:39:19 +08:00
hoshi-hiyouga 8f91420223
Merge pull request #3471 from BUAADreamer/main
add llava_150k en/zh mllm sft data
2024-04-26 23:36:41 +08:00
hoshi-hiyouga c29b257007
Update dataset_info.json 2024-04-26 23:34:34 +08:00
BUAADreamer a177872010 add llava_150k en/zh mllm sft data 2024-04-26 23:18:58 +08:00
hiyouga 168f56683a release v0.7.0 2024-04-26 23:18:00 +08:00
hiyouga e057c8de48 support mllm hf inference 2024-04-26 05:34:58 +08:00
hoshi-hiyouga f8c26e6a34
Update dataset_info.json 2024-04-26 03:03:36 +08:00
BUAADreamer 42c90c8183 merge data part to the text stream 2024-04-25 19:58:47 +08:00
BUAADreamer c6dd89918f merge data part to the text stream 2024-04-25 19:19:59 +08:00
BUAADreamer cfb485eddf add llava and instructblip 2024-04-25 00:22:43 +08:00
BUAADreamer 4dcb11eab7 add multimodal LLM BLIP-2 and InstructBLIP 2024-04-23 18:45:43 +08:00
hiyouga 6339edefff add dpo mix dataset 2024-04-20 01:31:38 +08:00
hiyouga d1fb6c72b5 fix #3247 2024-04-12 17:41:33 +08:00
li.yunhao 9c2ef9cdf4 fix pile datset hf hub url 2024-03-30 16:06:10 +08:00
hiyouga 3271af2afc add orca_dpo_pairs dataset 2024-03-20 20:09:06 +08:00
hiyouga 894d183214 update readme, add starcoder2, cosmopedia 2024-03-03 01:01:46 +08:00
hiyouga 32884523c5 update data 2024-03-02 19:37:18 +08:00
hiyouga 1630a4cb8f fix #2533 2024-02-21 22:47:48 +08:00
hiyouga 22acab8aff fix #2481 2024-02-15 19:07:47 +08:00
hiyouga 7d2dc83c5e improve aligner 2024-02-10 16:39:19 +08:00
Mark Mueller 1d3598afa1 Slim Orca data parsing 2024-02-08 19:32:20 +01:00
Johann-Peter Hartmann 49c69ea4b9 WS fix 2024-02-06 20:13:04 +01:00
Johann-Peter Hartmann 1126563505 add ranking to dpo dataset 2024-02-06 20:12:36 +01:00
Johann-Peter Hartmann 870182c3a9 remove comma 2024-02-03 08:48:39 +01:00
Johann-Peter Hartmann d9a8301ed4 Add support for german datasets 2024-01-30 10:18:01 +01:00
hiyouga dbaaa4546e Update dataset_info.json 2024-01-23 00:10:32 +08:00
hiyouga f1067d2b58 enable cutoff len 2024-01-18 12:25:42 +08:00
hiyouga d9f1cae351 support function calling 2024-01-18 09:54:23 +08:00
hiyouga 5b93d545e2 tiny update 2023-12-25 18:29:34 +08:00
hiyouga 71389be37c support autogptq in llama board #246 2023-12-16 16:31:30 +08:00
hiyouga 0a9c6e0146 support system column #1765 2023-12-12 19:45:59 +08:00
hiyouga d5b2c57a35 fix modelscope data hub 2023-12-12 18:33:06 +08:00
hoshi-hiyouga 6382efec52
Merge branch 'main' into feat/support_ms 2023-12-12 17:55:32 +08:00
xingjun.wang e80a989d49 modify guanaco 2023-12-12 15:00:37 +08:00
xingjun.wang 73b50a26b9 update dataset info 2023-12-12 14:53:59 +08:00
xingjun.wang 09533e95ed update args for MsDataset.load 2023-12-12 13:02:54 +08:00
xingjun.wang fe4acc66b0 add new datasets 2023-12-12 12:44:15 +08:00
xingjun.wang 0ce18a3782 add open orca 2023-12-12 12:34:04 +08:00
hiyouga 28d5de7e78 fix #1784 2023-12-09 20:53:18 +08:00
yuze.zyz e4cf2a75ca fix typo 2023-12-08 18:13:26 +08:00
yuze.zyz 9c2247d700 support ms dataset 2023-12-08 18:00:57 +08:00
hiyouga bf6f6aeefe fix #1696 2023-12-01 15:34:50 +08:00
Marco 9468ee9012
Update dataset_info.json
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following
2023-11-30 16:21:34 +01:00
hiyouga 7b1aa6f63c update dataset 2023-11-17 23:19:12 +08:00
hiyouga ce78303600 support full-parameter PPO 2023-11-16 02:08:04 +08:00
hiyouga 386f590209 add template, modify datasets 2023-11-09 15:53:23 +08:00
hiyouga cc8ffa10d8 update data readme (zh) 2023-11-02 23:42:49 +08:00
hiyouga a837172413 support sharegpt format, add datasets 2023-11-02 23:10:04 +08:00
hiyouga 026af87e7f add MathInstruct dataset 2023-09-13 22:30:14 +08:00
hiyouga a9d1fb72f7 refactor dataset_attr, add eos in pt, fix #757 2023-09-01 19:00:45 +08:00