Commit Graph

1023 Commits

Author SHA1 Message Date
hiyouga 85c376fc1e fix patcher 2024-03-15 19:18:42 +08:00
hoshi-hiyouga 113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga 6bc2c23b6d fix export 2024-03-15 15:06:30 +08:00
S3Studio e75407febd Use official Nvidia base image
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
S3Studio 6a5693d11d improve Docker build and runtime parameters
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
2024-03-15 08:57:46 +08:00
hiyouga 6ebde4f23e tiny fix 2024-03-14 21:19:06 +08:00
hiyouga 3b4a59bfb1 fix export 2024-03-14 18:17:01 +08:00
hiyouga 8172530d54 fix bug 2024-03-13 23:55:31 +08:00
hiyouga 714d936dfb fix bug 2024-03-13 23:43:42 +08:00
hiyouga 72367307df improve lora+ impl. 2024-03-13 23:32:51 +08:00
hoshi-hiyouga 4e5e99af43
Merge pull request #2830 from qibaoyuan/lora_plus
[FEATURE]: ADD LORA+ ALGORITHM
2024-03-13 20:15:46 +08:00
齐保元 a0965cd62c [FEATURE]: ADD LORA+ ALGORITHM 2024-03-13 19:43:27 +08:00
hiyouga dfd451b722 Update wechat.jpg 2024-03-13 19:03:00 +08:00
hiyouga 0b4a5bf509 fix #2817 2024-03-13 12:42:03 +08:00
hiyouga b9f87cdc11 fix #2802 2024-03-13 12:33:45 +08:00
hiyouga 96ce76cd27 fix kv cache 2024-03-13 01:21:50 +08:00
hiyouga 19ef482649 support QDoRA 2024-03-12 22:12:42 +08:00
hiyouga 70a3052dd8 patch for gemma cpt 2024-03-12 21:21:54 +08:00
hiyouga 60cc17f3a8 fix plot issues 2024-03-12 18:41:35 +08:00
hiyouga b3247d6a16 support olmo 2024-03-12 18:30:38 +08:00
hiyouga 8d8956bad5 fix #2802 2024-03-12 17:08:34 +08:00
hiyouga 06c97083e1 fix #2803 2024-03-12 16:57:39 +08:00
hiyouga 07f9b754a7 fix #2782 #2798 2024-03-12 15:53:29 +08:00
hoshi-hiyouga c901aa63ff
Merge pull request #2743 from S3Studio/DockerizeSupport
Add dockerize support
2024-03-12 00:05:49 +08:00
hiyouga e874c00906 fix #2775 2024-03-11 00:42:54 +08:00
hiyouga 352693e2dc tiny fix 2024-03-11 00:17:18 +08:00
hiyouga be99799413 update parser 2024-03-10 13:35:20 +08:00
hiyouga 8664262cde support layerwise galore 2024-03-10 00:24:11 +08:00
hiyouga 18ffce36b5 fix #2732 2024-03-09 22:37:16 +08:00
hiyouga bdb496644c allow non-packing pretraining 2024-03-09 22:21:46 +08:00
hiyouga 412c52e325 fix #2766 2024-03-09 21:35:24 +08:00
hiyouga af0e370fb1 use default arg for freeze tuning 2024-03-09 06:08:48 +08:00
hiyouga 818726e9bc add GaLore results 2024-03-09 04:11:55 +08:00
hiyouga 393c2de27c update hardware requirements 2024-03-09 03:58:18 +08:00
hiyouga 4c00bcdcae update examples 2024-03-09 02:30:37 +08:00
hiyouga e8dd38b7fd fix #2756 , patch #2746 2024-03-09 02:01:26 +08:00
hoshi-hiyouga 516d0ddc66
Merge pull request #2746 from stephen-nju/main
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga 74ff8664d7 Update setup.py 2024-03-09 00:14:48 +08:00
hiyouga 10be2f0ecc fix aqlm version 2024-03-09 00:09:09 +08:00
hiyouga 8a45213440 fix example params 2024-03-08 20:41:43 +08:00
stephen_zhu aa71571b77 update 2024-03-08 12:47:44 +08:00
stephen cdb7f82869 fix ppo runtime error 2024-03-08 11:48:26 +08:00
S3Studio 3d911ae713 Add dockerize support
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.
2024-03-08 10:47:28 +08:00
hiyouga 4a2cc60b94 update readme 2024-03-08 03:06:21 +08:00
hiyouga 5d956e2a51 fix chat engine, update webui 2024-03-08 03:01:53 +08:00
hiyouga 5cd4947650 Update setup.py 2024-03-08 01:23:00 +08:00
hiyouga 0ac6b40a47 update galore args 2024-03-08 01:17:32 +08:00
hiyouga 33a4c24a8a fix galore 2024-03-08 00:44:51 +08:00
hiyouga 57452a4aa1 add Yi-9B model 2024-03-07 23:11:57 +08:00
hiyouga 7230e1177d add galore examples 2024-03-07 22:53:45 +08:00