hiyouga
5ee04d418c
update readme
2024-04-26 23:39:19 +08:00
hoshi-hiyouga
8f91420223
Merge pull request #3471 from BUAADreamer/main
...
add llava_150k en/zh mllm sft data
2024-04-26 23:36:41 +08:00
hoshi-hiyouga
c29b257007
Update dataset_info.json
2024-04-26 23:34:34 +08:00
BUAADreamer
a177872010
add llava_150k en/zh mllm sft data
2024-04-26 23:18:58 +08:00
hiyouga
168f56683a
release v0.7.0
2024-04-26 23:18:00 +08:00
hiyouga
e057c8de48
support mllm hf inference
2024-04-26 05:34:58 +08:00
hoshi-hiyouga
f8c26e6a34
Update dataset_info.json
2024-04-26 03:03:36 +08:00
BUAADreamer
42c90c8183
merge data part to the text stream
2024-04-25 19:58:47 +08:00
BUAADreamer
c6dd89918f
merge data part to the text stream
2024-04-25 19:19:59 +08:00
BUAADreamer
cfb485eddf
add llava and instructblip
2024-04-25 00:22:43 +08:00
BUAADreamer
4dcb11eab7
add multimodal LLM BLIP-2 and InstructBLIP
2024-04-23 18:45:43 +08:00
hiyouga
6339edefff
add dpo mix dataset
2024-04-20 01:31:38 +08:00
hiyouga
d1fb6c72b5
fix #3247
2024-04-12 17:41:33 +08:00
li.yunhao
9c2ef9cdf4
fix pile datset hf hub url
2024-03-30 16:06:10 +08:00
hiyouga
3271af2afc
add orca_dpo_pairs dataset
2024-03-20 20:09:06 +08:00
hiyouga
894d183214
update readme, add starcoder2, cosmopedia
2024-03-03 01:01:46 +08:00
hiyouga
32884523c5
update data
2024-03-02 19:37:18 +08:00
hiyouga
1630a4cb8f
fix #2533
2024-02-21 22:47:48 +08:00
hiyouga
22acab8aff
fix #2481
2024-02-15 19:07:47 +08:00
hiyouga
7d2dc83c5e
improve aligner
2024-02-10 16:39:19 +08:00
Mark Mueller
1d3598afa1
Slim Orca data parsing
2024-02-08 19:32:20 +01:00
Johann-Peter Hartmann
49c69ea4b9
WS fix
2024-02-06 20:13:04 +01:00
Johann-Peter Hartmann
1126563505
add ranking to dpo dataset
2024-02-06 20:12:36 +01:00
Johann-Peter Hartmann
870182c3a9
remove comma
2024-02-03 08:48:39 +01:00
Johann-Peter Hartmann
d9a8301ed4
Add support for german datasets
2024-01-30 10:18:01 +01:00
hiyouga
dbaaa4546e
Update dataset_info.json
2024-01-23 00:10:32 +08:00
hiyouga
f1067d2b58
enable cutoff len
2024-01-18 12:25:42 +08:00
hiyouga
d9f1cae351
support function calling
2024-01-18 09:54:23 +08:00
hiyouga
5b93d545e2
tiny update
2023-12-25 18:29:34 +08:00
hiyouga
71389be37c
support autogptq in llama board #246
2023-12-16 16:31:30 +08:00
hiyouga
0a9c6e0146
support system column #1765
2023-12-12 19:45:59 +08:00
hiyouga
d5b2c57a35
fix modelscope data hub
2023-12-12 18:33:06 +08:00
hoshi-hiyouga
6382efec52
Merge branch 'main' into feat/support_ms
2023-12-12 17:55:32 +08:00
xingjun.wang
e80a989d49
modify guanaco
2023-12-12 15:00:37 +08:00
xingjun.wang
73b50a26b9
update dataset info
2023-12-12 14:53:59 +08:00
xingjun.wang
09533e95ed
update args for MsDataset.load
2023-12-12 13:02:54 +08:00
xingjun.wang
fe4acc66b0
add new datasets
2023-12-12 12:44:15 +08:00
xingjun.wang
0ce18a3782
add open orca
2023-12-12 12:34:04 +08:00
hiyouga
28d5de7e78
fix #1784
2023-12-09 20:53:18 +08:00
yuze.zyz
e4cf2a75ca
fix typo
2023-12-08 18:13:26 +08:00
yuze.zyz
9c2247d700
support ms dataset
2023-12-08 18:00:57 +08:00
hiyouga
bf6f6aeefe
fix #1696
2023-12-01 15:34:50 +08:00
Marco
9468ee9012
Update dataset_info.json
...
Added the Nectar dataset already preprocessed and divided in sft and rl to which I added a preprompt to each instruction since it has been seen that this increase instruction following
2023-11-30 16:21:34 +01:00
hiyouga
7b1aa6f63c
update dataset
2023-11-17 23:19:12 +08:00
hiyouga
ce78303600
support full-parameter PPO
2023-11-16 02:08:04 +08:00
hiyouga
386f590209
add template, modify datasets
2023-11-09 15:53:23 +08:00
hiyouga
cc8ffa10d8
update data readme (zh)
2023-11-02 23:42:49 +08:00
hiyouga
a837172413
support sharegpt format, add datasets
2023-11-02 23:10:04 +08:00
hiyouga
026af87e7f
add MathInstruct dataset
2023-09-13 22:30:14 +08:00
hiyouga
a9d1fb72f7
refactor dataset_attr, add eos in pt, fix #757
2023-09-01 19:00:45 +08:00