improve doc
This commit is contained in:
parent
0430b35f25
commit
3307f4061c
45
README.md
45
README.md
|
@ -32,6 +32,7 @@ OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *del
|
|||
![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif)
|
||||
|
||||
## News
|
||||
- **2022.10.25** Release v0.3.2. Support [BMTrain]()! Improve docs. Add inspect utilities.
|
||||
- **2022.10.14** Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-3-0)
|
||||
- **2022.10.10** Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for [Delta Center](https://www.openbmb.org/toolKits/deltacenter), whose webpage is still under construction. Details can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-2-4)
|
||||
- **2022.03.24** We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance.
|
||||
|
@ -40,50 +41,32 @@ OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *del
|
|||
- **2022.02.16** Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing.
|
||||
|
||||
## Installation
|
||||
create a virtualenv (optional)
|
||||
1. create a virtualenv (optional)
|
||||
```shell
|
||||
conda create -n opendelta_env python=3.8
|
||||
conda activate opendelta_env
|
||||
```
|
||||
|
||||
### Using Pip
|
||||
2 install the lastest version
|
||||
```bash
|
||||
pip install git+https://github.com/thunlp/OpenDelta.git
|
||||
```
|
||||
|
||||
|
||||
|
||||
Install OpenDelta using pip as follows:
|
||||
```shell
|
||||
**or** install the lastest pip version (more stable)
|
||||
```bash
|
||||
pip install opendelta
|
||||
```
|
||||
|
||||
To play with the latest features, you can also install OpenDelta from the source.
|
||||
|
||||
### Build from Source
|
||||
|
||||
```shell
|
||||
git clone https://github.com/thunlp/OpenDelta.git
|
||||
**or** build from source
|
||||
```bash
|
||||
git clone git@github.com:thunlp/OpenDelta.git
|
||||
cd OpenDelta
|
||||
```
|
||||
|
||||
#### Option 1: If you won't modify the code, run
|
||||
```shell
|
||||
python setup.py install
|
||||
```
|
||||
# python setup.py develop # if you want to do some modifications on the code for your research:
|
||||
|
||||
#### Option 2: If you want to modify the code or keep the repo updated by git clone, run
|
||||
```shell
|
||||
python setup.py develop
|
||||
```
|
||||
|
||||
#### Tips
|
||||
- If you want to use mirror for installing the packages, please change the `index_url` in [setup.cfg](setup.cfg)
|
||||
|
||||
- If you encounter network error using setup.py, please firstly install the dependencies via
|
||||
```shell
|
||||
pip install -r requirements.txt && python setup.py develop
|
||||
```
|
||||
|
||||
## Must Try
|
||||
The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py)
|
||||
The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py) and [must_try.ipynb in colab](https://colab.research.google.com/drive/1Nbe9zxt8LGQnKmtvEs07IN_PznjNCyk4?usp=sharing).
|
||||
|
||||
```python
|
||||
# use tranformers as usual.
|
||||
|
@ -174,3 +157,5 @@ used models that OpenDelta are sure to support.
|
|||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -21,13 +21,14 @@ OpenDelta's documentation!
|
|||
notes/installation.md
|
||||
notes/quickstart.md
|
||||
notes/custom.md
|
||||
notes/saveload.md
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Advanced Usage
|
||||
|
||||
notes/autodelta.md
|
||||
notes/deltacenter.md
|
||||
notes/composition.md
|
||||
notes/pluginunplug.md
|
||||
notes/withbmtrain.md
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
Inspired by [Huggingface transformers AutoClasses](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/auto#transformers.AutoModel) , we provide an AutoDelta features for the users to
|
||||
|
||||
1. Easily to experiment with different delta models
|
||||
2. Fast deploy from configuration file, especially from the repos in [DeltaHub](https://huggingface.co/DeltaHub).
|
||||
2. Fast deploy from configuration file, especially from the repos in [DeltaCenter](https://examplelink).
|
||||
|
||||
|
||||
## Easily load from dict, so that subject to change the type of delta models.
|
||||
|
@ -50,18 +50,41 @@ name: t5lora
|
|||
|
||||
|
||||
|
||||
## Fast deploy from a finetuned delta checkpoints from DeltaHub
|
||||
## Fast deploy from a finetuned delta checkpoints from DeltaCenter
|
||||
|
||||
```python
|
||||
delta_model = AutoDeltaModel.from_finetuned("DeltaHub/sst2-t5-base", backbone_model=backbone_model) # TODO: the link may change.
|
||||
# use tranformers as usual.
|
||||
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
||||
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
|
||||
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
|
||||
# A running example
|
||||
inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
|
||||
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
|
||||
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
|
||||
```
|
||||
|
||||
Load delta model from delta center:
|
||||
```python
|
||||
# use existing delta models
|
||||
from opendelta import AutoDeltaModel, AutoDeltaConfig
|
||||
# use existing delta models from DeltaCenter
|
||||
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
|
||||
# freeze the whole backbone model except the delta models.
|
||||
delta.freeze_module()
|
||||
# visualize the change
|
||||
delta.log()
|
||||
|
||||
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
|
||||
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
|
||||
```
|
||||
|
||||
<div class="admonition note">
|
||||
<p class="title">**Hash checking**</p>
|
||||
<p class="title">**Hash check**</p>
|
||||
Since the delta model only works together with the backbone model.
|
||||
we will automatically check whether you load the delta model the same way it is trained.
|
||||
</p>
|
||||
<p>
|
||||
We calculate the trained model's [md5](http://some_link) and save it to the config. When finishing loading the delta model, we will re-calculate the md5 to see whether it changes.
|
||||
<p> Note that performance is guaranteed by passing the hash check, but there are cases where the hash check is not passed but performance is still normal for various reasons. We are checking the reasons for this. Please consider this feature as a supplement. </p>
|
||||
<p>Pass `check_hash=False` to disable the hash checking.</p>
|
||||
</div>
|
|
@ -10,9 +10,9 @@ model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
|
|||
|
||||
## STEP 2: Add delta modules
|
||||
We provide two alternatives to add the delta modules.
|
||||
### 2.1 Modification based on visualization
|
||||
Suppose we want to make the feedforward layer of each block as our [modification target module](targetmodules),
|
||||
We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
|
||||
### 2.1 Visualize the backbone structure
|
||||
Delta tuning's core change in the structure of the base model is to decorate (modify) the modules of the base model with small delta modules. We assume we want to treat the feedforward layer of each block as our [target modules](targetmodules). Since **different PLM name the submodules differently**,
|
||||
We should first know the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
|
||||
|
||||
```python
|
||||
from opendelta import Visualization
|
||||
|
@ -43,26 +43,28 @@ delta_model.log() # This will visualize the backbone after modification and othe
|
|||
|
||||
:::{admonition} Try different positions
|
||||
:class: tip
|
||||
OpenDelta provide the flexibility to add delta to different positions on the backbone model. For example, If you want to move the adapter in the above example after the layer norm of the feed forward layer. The code should be changed into
|
||||
OpenDelta provide the flexibility to add delta to various positions on the backbone model. For example, If you want to move the adapter in the above example after the layer norm of the feed forward layer. The code should be changed into
|
||||
```python
|
||||
# continue with the BART example, but not used later.
|
||||
delta_model = AdapterModel(backbone_model=model, modified_modules=['final_layer_norm'], bottleneck_dim=12)
|
||||
```
|
||||
The performance may vary due to positional differences, but there is no academic guarantee that one will outperform the other.
|
||||
The performance may vary due to positional differences, but there is currently theorectical guarantee that one will outperform the other.
|
||||
:::
|
||||
|
||||
|
||||
:::{admonition} Favored Configurations
|
||||
:class: tip
|
||||
Feel confused about the flexibility that OpenDelta brings? Currently you can refer to the papers for their configuration. And We will add [Favored Configurations](favoredconfiguration) soon.
|
||||
Feel confused about the flexibility that OpenDelta brings? The default configuration is the `default_modified_modules` attributes of each Delta model. Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
|
||||
|
||||
- [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
|
||||
- [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)
|
||||
:::
|
||||
|
||||
## STEP 3: Freezing parameters
|
||||
The main part of the backbone model is not automatically frozen (We may add the option in future). To freeze the main part of the backbone model except the trainable parts (usually the delta paramters), use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method. The `exclude` field obeys the same name-based addressing rules as the `modified_modules` field.
|
||||
## STEP 3: Freeze parameters
|
||||
So far the backbone model is still fully tunable. To freeze the main part of the backbone model except the trainable parts (usually the delta paramters), use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method. The syntax of `exclude` field also obeys the [name-based addressing](namebasedaddr) rules.
|
||||
|
||||
|
||||
```python
|
||||
# continue with the BART example
|
||||
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"], set_state_dict=True)
|
||||
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"])
|
||||
delta_model.log()
|
||||
```
|
||||
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
|
||||
|
@ -73,7 +75,14 @@ name: afterfreeze
|
|||
---
|
||||
```
|
||||
````
|
||||
The `set_state_dict=True` will tell the method to change the `state_dict` of the `backbone_model` to maintaining only the trainable parts.
|
||||
|
||||
Usually, we want to only save the trainable part, then we should modify the `state_dict` of the backbone model which original contains all the parameters. Now with `set_state_dict=True`, the `model.state_dict()` only contains the trainable parameters.
|
||||
```python
|
||||
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"], set_state_dict=True)
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## STEP 4: Normal training pipeline
|
||||
|
@ -83,21 +92,44 @@ The **model** then can be trained in traditional training scripts. Two things sh
|
|||
:::{admonition} Note
|
||||
:class: note
|
||||
1. No need to change the optimizer, since the optimizer will only calculated and store gradient for those parameters with `requires_grad=True`, and the `requires_grad` attribute has been changed during the call to [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method.
|
||||
2. `model.eval()` or `model.train()` should be used when needed to set dropout, etc. Delta model doesn't touch those configuration.
|
||||
2. `model.eval()` or `model.train()` should be used if we need to enable/disable dropout. Opendelta doesn't touch those configuration.
|
||||
:::
|
||||
## STEP 5: Saved/Share the Delta Model
|
||||
|
||||
<img src="../imgs/hint-icon-2.jpg" height="30px"> *see [Save a delta model to local, or share with the community](saveload).*
|
||||
|
||||
## STEP 5: Save and load the Delta Model
|
||||
### Option1: Use opendelta interface.
|
||||
One option is to use our provided interface. This will save both the configurations of the delta model and the parameters of all trainable parameters.
|
||||
```python
|
||||
delta_model.save_finetuned("some_local_path/")
|
||||
```
|
||||
When loading the delta_model, just call the `from_finetuned` methods. Note that the loaded model is fully trainable. If you want to continue to train it, please use `freeze_module` again.
|
||||
```python
|
||||
from transformers import AutoModelForSequenceClassification
|
||||
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
|
||||
from opendelta import AutoDeltaModel
|
||||
delta_model = AutoDeltaModel.from_finetuned("some_local_path/", backbone_model=model)
|
||||
```
|
||||
|
||||
### Option2: Use pytorch interface.
|
||||
Another option is to load the model using traditional pytorch ways.
|
||||
```python
|
||||
torch.save(model.state_dict(), "some_local_path/pytorch_model.bin")
|
||||
```
|
||||
Then load it into an initialied backbone model with delta model. Remember to use `strict=False` since now the state_dict contains only the trainable parameters.
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForSequenceClassification
|
||||
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
|
||||
from opendelta import AdapterModel
|
||||
delta_model = AdapterModel(backbone_model=model, modified_modules=['fc2'], bottleneck_dim=12)
|
||||
model.load_state_dict(torch.load("some_local_path/pytorch_model.bin"), strict=False)
|
||||
```
|
||||
|
||||
### Option3: Save and upload to DeltaCenter.
|
||||
You can also save the delta model to delta center to share with the community. See [instructions](deltacenter).
|
||||
|
||||
|
||||
|
||||
|
||||
(favoredconfiguration)=
|
||||
## Favored Configuration
|
||||
|
||||
Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
|
||||
|
||||
- [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
|
||||
- [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,35 @@
|
|||
# DeltaCenter
|
||||
|
||||
## Share to Delta Center.
|
||||
```python
|
||||
delta_model.save_finetuned("test_delta_model", push_to_dc = True)
|
||||
```
|
||||
|
||||
## Download from Delta Center.
|
||||
```python
|
||||
# use tranformers as usual.
|
||||
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
||||
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
|
||||
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
|
||||
# A running example
|
||||
inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
|
||||
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
|
||||
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
|
||||
```
|
||||
|
||||
Load delta model from delta center:
|
||||
```python
|
||||
# use existing delta models
|
||||
from opendelta import AutoDeltaModel, AutoDeltaConfig
|
||||
# use existing delta models from DeltaCenter
|
||||
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
|
||||
# freeze the whole backbone model except the delta models.
|
||||
delta.freeze_module()
|
||||
# visualize the change
|
||||
delta.log()
|
||||
|
||||
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
|
||||
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
|
||||
```
|
||||
|
||||
|
|
@ -1,97 +0,0 @@
|
|||
# Save and Share the Delta
|
||||
|
||||
## Space efficient saving without changing the code.
|
||||
After a modified backbone model is trained, you can save only trained part without change to any code, because **the state dict of the backbone model has been changed to the trainable parts**
|
||||
|
||||
```python
|
||||
from opendelta import CompacterModel
|
||||
from transformers import BertForMaskedLM
|
||||
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
|
||||
delta_model = CompacterModel(backbone_model) # modify the default modules.
|
||||
|
||||
# freeze module
|
||||
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
|
||||
# or
|
||||
delta_model.freeze_module(exclude=["deltas"])
|
||||
```
|
||||
### save the checkpoint.
|
||||
now save the backbone_model in normal way, and the checkpoint is **very space efficient**.
|
||||
|
||||
```python
|
||||
# ...
|
||||
# After some training pipeline
|
||||
# ...
|
||||
torch.save(backbone_model.state_dict(), "delta.ckpt")
|
||||
|
||||
# the checkpoint size
|
||||
import os
|
||||
print("checkpoint size: {:.2f}M".format(os.path.getsize("delta.ckpt")/1024**2))
|
||||
# checkpoint size: 0.32M
|
||||
```
|
||||
|
||||
### load the checkpoint.
|
||||
In order to load the checkpoint, you should make sure the backbone model is a modified ones (so that it can take in the delta parameters).
|
||||
Then load the checkpoint with `strict=False`.
|
||||
```python
|
||||
backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
|
||||
# this will return long string of warning about the 'missing key'.
|
||||
# if you want to supress it, use
|
||||
# _ = backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
|
||||
```
|
||||
|
||||
## Save/Load the entire model after training.
|
||||
|
||||
### save a delta model.
|
||||
```python
|
||||
delta_model.save_finetuned("delta_model")
|
||||
# Configuration saved in delta_model/config.json
|
||||
# Model weights saved in delta_model/pytorch_model.bin
|
||||
```
|
||||
This will save all the trained parameters and the configuration of the delta model to path `delta_model/`
|
||||
|
||||
### load a delta model.
|
||||
|
||||
```python
|
||||
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
|
||||
delta_model.from_finetuned("delta_model", backbone_model, local_files_only=True)
|
||||
# passing local_files_only=True will save the time of checking in the web.
|
||||
```
|
||||
|
||||
## Share or download a model to/from the community.
|
||||
|
||||
### Share.
|
||||
```python
|
||||
delta_model.save_finetuned("test_delta_model", push_to_hub = True)
|
||||
```
|
||||
|
||||
### Download from community.
|
||||
```python
|
||||
from transformers import AutoModelForSeq2SeqLM
|
||||
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
|
||||
from opendelta import AutoDeltaModel
|
||||
delta = AutoDeltaModel.from_finetuned("DeltaHub/lora_t5-base_mrpc", backbone_model=t5)
|
||||
delta.log()
|
||||
```
|
||||
|
||||
<div class="admonition tip">
|
||||
<p class="title">**Push to Hub**</p>
|
||||
<p> Currently we only provide the option to push to huggingface model hub.</p>
|
||||
<p> Before push to hub, you may need to register an account on Huggingface. You can refer to this [tutorial about model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing)
|
||||
</p>
|
||||
<p> In some cases, your checkpoint is still large for git, please install [`git-lfs`](https://git-lfs.github.com).
|
||||
</p>
|
||||
</div>
|
||||
|
||||
:::{admonition} **Sharing with the Community**
|
||||
:class: tip
|
||||
If you are satisfied with your checkpoint, do not forget to share your model to <a href="https://huggingface.co/DeltaHub">DeltaHub</a>:
|
||||
1. Add yourself to DeltaHub with the [public link](https://huggingface.co/organizations/DeltaHub/share/QzkBuLSmlVnNhQqHYnekoTXwSRkoRHBwZA)
|
||||
2. Be sure to edit your model card to clearly illustrate the delta model before you share.
|
||||
3. Click `setting` on the model
|
||||
4. Transfer the model in `rename or transfer this model` section.
|
||||
:::
|
||||
|
||||
|
||||
## Save & Load for Composition of Delta
|
||||
|
||||
<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition](composition) of delta model. Please wait for future releases.
|
|
@ -1,6 +1,7 @@
|
|||
# Update Logs and Known Issues
|
||||
|
||||
## Version 0.3.2
|
||||
- We improve the docs.
|
||||
- We support BMTrain to accelerate the training, and parallelize the training of models that are hard to fit in a single GPU. Check [tutorial/2_with_bmtrain.py](https://github.com/thunlp/OpenDelta/tree/main/examples/tutorial/2_with_bmtrain.py)
|
||||
- We add a functionality to [inspect the optimizer](https://github.com/thunlp/OpenDelta/tree/main/opendelta/utils/inspect.py). The user can see the number of trainable parameters in the optimizer and verify that opendelta is being used correctly.
|
||||
- We move the functions to inspect the delta models into [inspect.py](https://github.com/thunlp/OpenDelta/tree/main/opendelta/utils/inspect.py)
|
||||
|
|
Loading…
Reference in New Issue