hiyouga
85c376fc1e
fix patcher
2024-03-15 19:18:42 +08:00
hoshi-hiyouga
113cc04719
Merge pull request #2849 from S3Studio/DockerizeSupport
...
Improve Dockerize support
2024-03-15 19:16:02 +08:00
hiyouga
6bc2c23b6d
fix export
2024-03-15 15:06:30 +08:00
S3Studio
e75407febd
Use official Nvidia base image
...
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
2024-03-15 08:59:13 +08:00
S3Studio
6a5693d11d
improve Docker build and runtime parameters
...
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
2024-03-15 08:57:46 +08:00
hiyouga
6ebde4f23e
tiny fix
2024-03-14 21:19:06 +08:00
hiyouga
3b4a59bfb1
fix export
2024-03-14 18:17:01 +08:00
hiyouga
8172530d54
fix bug
2024-03-13 23:55:31 +08:00
hiyouga
714d936dfb
fix bug
2024-03-13 23:43:42 +08:00
hiyouga
72367307df
improve lora+ impl.
2024-03-13 23:32:51 +08:00
hoshi-hiyouga
4e5e99af43
Merge pull request #2830 from qibaoyuan/lora_plus
...
[FEATURE]: ADD LORA+ ALGORITHM
2024-03-13 20:15:46 +08:00
齐保元
a0965cd62c
[FEATURE]: ADD LORA+ ALGORITHM
2024-03-13 19:43:27 +08:00
hiyouga
dfd451b722
Update wechat.jpg
2024-03-13 19:03:00 +08:00
hiyouga
0b4a5bf509
fix #2817
2024-03-13 12:42:03 +08:00
hiyouga
b9f87cdc11
fix #2802
2024-03-13 12:33:45 +08:00
hiyouga
96ce76cd27
fix kv cache
2024-03-13 01:21:50 +08:00
hiyouga
19ef482649
support QDoRA
2024-03-12 22:12:42 +08:00
hiyouga
70a3052dd8
patch for gemma cpt
2024-03-12 21:21:54 +08:00
hiyouga
60cc17f3a8
fix plot issues
2024-03-12 18:41:35 +08:00
hiyouga
b3247d6a16
support olmo
2024-03-12 18:30:38 +08:00
hiyouga
8d8956bad5
fix #2802
2024-03-12 17:08:34 +08:00
hiyouga
06c97083e1
fix #2803
2024-03-12 16:57:39 +08:00
hiyouga
07f9b754a7
fix #2782 #2798
2024-03-12 15:53:29 +08:00
hoshi-hiyouga
c901aa63ff
Merge pull request #2743 from S3Studio/DockerizeSupport
...
Add dockerize support
2024-03-12 00:05:49 +08:00
hiyouga
e874c00906
fix #2775
2024-03-11 00:42:54 +08:00
hiyouga
352693e2dc
tiny fix
2024-03-11 00:17:18 +08:00
hiyouga
be99799413
update parser
2024-03-10 13:35:20 +08:00
hiyouga
8664262cde
support layerwise galore
2024-03-10 00:24:11 +08:00
hiyouga
18ffce36b5
fix #2732
2024-03-09 22:37:16 +08:00
hiyouga
bdb496644c
allow non-packing pretraining
2024-03-09 22:21:46 +08:00
hiyouga
412c52e325
fix #2766
2024-03-09 21:35:24 +08:00
hiyouga
af0e370fb1
use default arg for freeze tuning
2024-03-09 06:08:48 +08:00
hiyouga
818726e9bc
add GaLore results
2024-03-09 04:11:55 +08:00
hiyouga
393c2de27c
update hardware requirements
2024-03-09 03:58:18 +08:00
hiyouga
4c00bcdcae
update examples
2024-03-09 02:30:37 +08:00
hiyouga
e8dd38b7fd
fix #2756 , patch #2746
2024-03-09 02:01:26 +08:00
hoshi-hiyouga
516d0ddc66
Merge pull request #2746 from stephen-nju/main
...
fix deepspeed ppo RuntimeError
2024-03-09 01:37:00 +08:00
hiyouga
74ff8664d7
Update setup.py
2024-03-09 00:14:48 +08:00
hiyouga
10be2f0ecc
fix aqlm version
2024-03-09 00:09:09 +08:00
hiyouga
8a45213440
fix example params
2024-03-08 20:41:43 +08:00
stephen_zhu
aa71571b77
update
2024-03-08 12:47:44 +08:00
stephen
cdb7f82869
fix ppo runtime error
2024-03-08 11:48:26 +08:00
S3Studio
3d911ae713
Add dockerize support
...
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.
2024-03-08 10:47:28 +08:00
hiyouga
4a2cc60b94
update readme
2024-03-08 03:06:21 +08:00
hiyouga
5d956e2a51
fix chat engine, update webui
2024-03-08 03:01:53 +08:00
hiyouga
5cd4947650
Update setup.py
2024-03-08 01:23:00 +08:00
hiyouga
0ac6b40a47
update galore args
2024-03-08 01:17:32 +08:00
hiyouga
33a4c24a8a
fix galore
2024-03-08 00:44:51 +08:00
hiyouga
57452a4aa1
add Yi-9B model
2024-03-07 23:11:57 +08:00
hiyouga
7230e1177d
add galore examples
2024-03-07 22:53:45 +08:00