hiyouga
7ea1a1f5b3
Update wechat.jpg
2024-03-26 16:24:42 +08:00
hiyouga
ba70aca8fb
release v0.6.0 (real)
2024-03-25 23:37:48 +08:00
hiyouga
98a42cbdaa
tiny fix
2024-03-25 23:28:52 +08:00
hiyouga
7b3d8188f5
update readme
2024-03-25 23:06:13 +08:00
hoshi-hiyouga
f633ac6646
Merge pull request #2967 from Tsumugii24/main
...
Update README_zh.md
2024-03-25 23:02:22 +08:00
Tsumugii24
1704599503
Update README.md
2024-03-25 22:54:38 +08:00
Tsumugii24
7aa77a3451
Update README_zh.md
2024-03-25 22:54:26 +08:00
hiyouga
1484f76a95
add arg check
2024-03-25 22:42:58 +08:00
hiyouga
6f2b563f12
release v0.6.0
2024-03-25 22:38:56 +08:00
Tsumugii24
bb4ca1691a
Update README_zh.md
2024-03-25 22:31:03 +08:00
hoshi-hiyouga
f33a3dfadc
Merge pull request #2963 from rkinas/patch-1
...
Update requirements.txt
2024-03-25 21:49:34 +08:00
Remek Kinas
b02899bf89
Update requirements.txt
2024-03-25 14:30:58 +01:00
hiyouga
558a538724
tiny fix
2024-03-25 21:18:08 +08:00
hoshi-hiyouga
49f9dbb4b1
Merge pull request #2945 from marko1616/bugfix/lora-model-merge
...
修复了在 transformers > 4.36.2 版本中部分模型合并 Lora 模型时因生成配置校验而导致的崩溃问题
2024-03-25 13:36:08 +08:00
marko1616
c8f0d99704
pass ruff check
2024-03-24 16:12:10 +08:00
marko1616
6f080fdba3
fix Llama lora merge crash
2024-03-24 03:06:11 +08:00
marko1616
51349ea1cc
fix Llama lora merge crash
2024-03-24 02:55:23 +08:00
marko1616
c1e2c4ea45
fix Llama lora merge crash
2024-03-24 02:44:35 +08:00
hiyouga
140ad4ad56
fix #2936
2024-03-24 00:43:21 +08:00
hiyouga
7afbc85dae
fix #2928
2024-03-24 00:34:54 +08:00
hiyouga
a1c8c98c5f
fix #2941
2024-03-24 00:28:44 +08:00
hiyouga
564d57aa23
Update wechat.jpg
2024-03-22 14:00:37 +08:00
hoshi-hiyouga
ce261fdd64
Merge pull request #2919 from 0xez/main
...
Update README.md, fix the release date of the paper
2024-03-22 12:12:24 +08:00
0xez
be0360303d
Update README_zh.md, fix the release date of the paper
2024-03-22 10:41:17 +08:00
0xez
675ba41562
Update README.md, fix the release date of the paper
2024-03-21 22:14:48 +08:00
hiyouga
96702620c4
move file
2024-03-21 17:05:17 +08:00
hiyouga
5eaa50fa01
add citation
2024-03-21 17:04:10 +08:00
hiyouga
0581bfdbc7
paper release
2024-03-21 13:49:17 +08:00
hiyouga
bfe7a91289
update readme
2024-03-21 00:48:42 +08:00
hiyouga
8408225162
support fsdp + qlora
2024-03-21 00:36:06 +08:00
hiyouga
3271af2afc
add orca_dpo_pairs dataset
2024-03-20 20:09:06 +08:00
hoshi-hiyouga
b2dfbd728f
Merge pull request #2905 from SirlyDreamer/main
...
Follow HF_ENDPOINT environment variable
2024-03-20 18:09:54 +08:00
hiyouga
9bec3c98a2
fix #2777 #2895
2024-03-20 17:59:45 +08:00
hiyouga
7b8f502901
fix #2346
2024-03-20 17:56:33 +08:00
SirlyDreamer
e165965341
Follow HF_ENDPOINT environment variable
2024-03-20 08:31:30 +00:00
hoshi-hiyouga
a773035709
Merge pull request #2903 from khazic/main
...
Updated README with new information
2024-03-20 16:13:44 +08:00
khazic
8d10fa71c2
Updated README with new information
2024-03-20 14:38:08 +08:00
khazic
0531dac30d
Updated README with new information
2024-03-20 14:21:16 +08:00
刘一博
df9b4fb90a
Updated README with new information
2024-03-20 14:11:28 +08:00
hiyouga
bea31b9b12
Update wechat.jpg
2024-03-18 16:48:32 +08:00
hiyouga
8e04794b2d
fix packages
2024-03-17 22:32:03 +08:00
hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d
fix export
2024-03-15 15:06:30 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
S3Studio
6a5693d11d
improve Docker build and runtime parameters
...
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
2024-03-15 08:57:46 +08:00
hiyouga
6ebde4f23e
tiny fix
2024-03-14 21:19:06 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
8172530d54
fix bug
2024-03-13 23:55:31 +08:00
hiyouga
714d936dfb
fix bug
2024-03-13 23:43:42 +08:00