Commit Graph

1052 Commits

Author SHA1 Message Date
hiyouga 558a538724 tiny fix 2024-03-25 21:18:08 +08:00
hoshi-hiyouga 49f9dbb4b1
Merge pull request #2945 from marko1616/bugfix/lora-model-merge
修复了在 transformers > 4.36.2 版本中部分模型合并 Lora 模型时因生成配置校验而导致的崩溃问题
2024-03-25 13:36:08 +08:00
marko1616 c8f0d99704 pass ruff check 2024-03-24 16:12:10 +08:00
marko1616 6f080fdba3 fix Llama lora merge crash 2024-03-24 03:06:11 +08:00
marko1616 51349ea1cc fix Llama lora merge crash 2024-03-24 02:55:23 +08:00
marko1616 c1e2c4ea45 fix Llama lora merge crash 2024-03-24 02:44:35 +08:00
hiyouga 140ad4ad56 fix #2936 2024-03-24 00:43:21 +08:00
hiyouga 7afbc85dae fix #2928 2024-03-24 00:34:54 +08:00
hiyouga a1c8c98c5f fix #2941 2024-03-24 00:28:44 +08:00
hiyouga 564d57aa23 Update wechat.jpg 2024-03-22 14:00:37 +08:00
hoshi-hiyouga ce261fdd64
Merge pull request #2919 from 0xez/main
Update README.md, fix the release date of the paper
2024-03-22 12:12:24 +08:00
0xez be0360303d
Update README_zh.md, fix the release date of the paper 2024-03-22 10:41:17 +08:00
0xez 675ba41562
Update README.md, fix the release date of the paper 2024-03-21 22:14:48 +08:00
hiyouga 96702620c4 move file 2024-03-21 17:05:17 +08:00
hiyouga 5eaa50fa01 add citation 2024-03-21 17:04:10 +08:00
hiyouga 0581bfdbc7 paper release 2024-03-21 13:49:17 +08:00
hiyouga bfe7a91289 update readme 2024-03-21 00:48:42 +08:00
hiyouga 8408225162 support fsdp + qlora 2024-03-21 00:36:06 +08:00
hiyouga 3271af2afc add orca_dpo_pairs dataset 2024-03-20 20:09:06 +08:00
hoshi-hiyouga b2dfbd728f
Merge pull request #2905 from SirlyDreamer/main
Follow HF_ENDPOINT environment variable
2024-03-20 18:09:54 +08:00
hiyouga 9bec3c98a2 fix #2777 #2895 2024-03-20 17:59:45 +08:00
hiyouga 7b8f502901 fix #2346 2024-03-20 17:56:33 +08:00
SirlyDreamer e165965341 Follow HF_ENDPOINT environment variable 2024-03-20 08:31:30 +00:00
hoshi-hiyouga a773035709
Merge pull request #2903 from khazic/main
Updated README with new information
2024-03-20 16:13:44 +08:00
khazic 8d10fa71c2 Updated README with new information 2024-03-20 14:38:08 +08:00
khazic 0531dac30d Updated README with new information 2024-03-20 14:21:16 +08:00
刘一博 df9b4fb90a Updated README with new information 2024-03-20 14:11:28 +08:00
hiyouga bea31b9b12 Update wechat.jpg 2024-03-18 16:48:32 +08:00
hiyouga 8e04794b2d fix packages 2024-03-17 22:32:03 +08:00
hiyouga 85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
hoshi-hiyouga 113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga 6bc2c23b6d fix export 2024-03-15 15:06:30 +08:00
S3Studio e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
S3Studio 6a5693d11d improve Docker build and runtime parameters
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
2024-03-15 08:57:46 +08:00
hiyouga 6ebde4f23e tiny fix 2024-03-14 21:19:06 +08:00
hiyouga 3b4a59bfb1 fix export 2024-03-14 18:17:01 +08:00
hiyouga 8172530d54 fix bug 2024-03-13 23:55:31 +08:00
hiyouga 714d936dfb fix bug 2024-03-13 23:43:42 +08:00
hiyouga 72367307df improve lora+ impl. 2024-03-13 23:32:51 +08:00
hoshi-hiyouga 4e5e99af43
Merge pull request #2830 from qibaoyuan/lora_plus
[FEATURE]: ADD LORA+ ALGORITHM
2024-03-13 20:15:46 +08:00
齐保元 a0965cd62c [FEATURE]: ADD LORA+ ALGORITHM 2024-03-13 19:43:27 +08:00
hiyouga dfd451b722 Update wechat.jpg 2024-03-13 19:03:00 +08:00
hiyouga 0b4a5bf509 fix #2817 2024-03-13 12:42:03 +08:00
hiyouga b9f87cdc11 fix #2802 2024-03-13 12:33:45 +08:00
hiyouga 96ce76cd27 fix kv cache 2024-03-13 01:21:50 +08:00
hiyouga 19ef482649 support QDoRA 2024-03-12 22:12:42 +08:00
hiyouga 70a3052dd8 patch for gemma cpt 2024-03-12 21:21:54 +08:00
hiyouga 60cc17f3a8 fix plot issues 2024-03-12 18:41:35 +08:00
hiyouga b3247d6a16 support olmo 2024-03-12 18:30:38 +08:00
hiyouga 8d8956bad5 fix #2802 2024-03-12 17:08:34 +08:00