1. update the version of pre-built bitsandbytes library

2. add pre-built flash-attn library
2024-02-20 11:26:22 +08:00 · 2024-02-20 11:26:22 +08:00 · 95f53a46bd
parent ba998c67ab
commit 95f53a46bd
2 changed files with 6 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -267,6 +267,8 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi
 pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
 ```

+To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.
+
 ### Use ModelScope Hub (optional)

 If you have trouble with downloading models and datasets from Hugging Face, you can use LLaMA-Factory together with ModelScope in the following manner.
--- a/README_zh.md
+++ b/README_zh.md
@ -261,12 +261,14 @@ cd LLaMA-Factory
 pip install -r requirements.txt
 ```

-如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.1.
+如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2.

 ```bash
-pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
+pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whl
 ```

+如果要在 Windows 平台上开启Flash Attention， 需要安装预编译的 `flash-attn` 库，支持CUDA 12.1 到12.2, 请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装
+
 ### 使用魔搭社区（可跳过）

 如果您在 Hugging Face 模型和数据集的下载中遇到了问题，可以通过下述方法使用魔搭社区。