improve doc

This commit is contained in:
shengdinghu 2022-10-25 03:10:26 +00:00
parent 0430b35f25
commit 3307f4061c
7 changed files with 134 additions and 154 deletions

View File

@ -32,6 +32,7 @@ OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *del
![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif)
## News
- **2022.10.25** Release v0.3.2. Support [BMTrain]()! Improve docs. Add inspect utilities.
- **2022.10.14** Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-3-0)
- **2022.10.10** Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for [Delta Center](https://www.openbmb.org/toolKits/deltacenter), whose webpage is still under construction. Details can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-2-4)
- **2022.03.24** We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance.
@ -40,50 +41,32 @@ OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *del
- **2022.02.16** Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing.
## Installation
create a virtualenv (optional)
1. create a virtualenv (optional)
```shell
conda create -n opendelta_env python=3.8
conda activate opendelta_env
```
### Using Pip
2 install the lastest version
```bash
pip install git+https://github.com/thunlp/OpenDelta.git
```
Install OpenDelta using pip as follows:
```shell
**or** install the lastest pip version (more stable)
```bash
pip install opendelta
```
To play with the latest features, you can also install OpenDelta from the source.
### Build from Source
```shell
git clone https://github.com/thunlp/OpenDelta.git
**or** build from source
```bash
git clone git@github.com:thunlp/OpenDelta.git
cd OpenDelta
```
#### Option 1: If you won't modify the code, run
```shell
python setup.py install
```
# python setup.py develop # if you want to do some modifications on the code for your research:
#### Option 2: If you want to modify the code or keep the repo updated by git clone, run
```shell
python setup.py develop
```
#### Tips
- If you want to use mirror for installing the packages, please change the `index_url` in [setup.cfg](setup.cfg)
- If you encounter network error using setup.py, please firstly install the dependencies via
```shell
pip install -r requirements.txt && python setup.py develop
```
## Must Try
The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py)
The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py) and [must_try.ipynb in colab](https://colab.research.google.com/drive/1Nbe9zxt8LGQnKmtvEs07IN_PznjNCyk4?usp=sharing).
```python
# use tranformers as usual.
@ -174,3 +157,5 @@ used models that OpenDelta are sure to support.

View File

@ -21,13 +21,14 @@ OpenDelta's documentation!
notes/installation.md
notes/quickstart.md
notes/custom.md
notes/saveload.md
.. toctree::
:maxdepth: 1
:caption: Advanced Usage
notes/autodelta.md
notes/deltacenter.md
notes/composition.md
notes/pluginunplug.md
notes/withbmtrain.md

View File

@ -4,7 +4,7 @@
Inspired by [Huggingface transformers AutoClasses](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/auto#transformers.AutoModel) , we provide an AutoDelta features for the users to
1. Easily to experiment with different delta models
2. Fast deploy from configuration file, especially from the repos in [DeltaHub](https://huggingface.co/DeltaHub).
2. Fast deploy from configuration file, especially from the repos in [DeltaCenter](https://examplelink).
## Easily load from dict, so that subject to change the type of delta models.
@ -50,18 +50,41 @@ name: t5lora
## Fast deploy from a finetuned delta checkpoints from DeltaHub
## Fast deploy from a finetuned delta checkpoints from DeltaCenter
```python
delta_model = AutoDeltaModel.from_finetuned("DeltaHub/sst2-t5-base", backbone_model=backbone_model) # TODO: the link may change.
# use tranformers as usual.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
# A running example
inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
```
Load delta model from delta center:
```python
# use existing delta models
from opendelta import AutoDeltaModel, AutoDeltaConfig
# use existing delta models from DeltaCenter
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
# freeze the whole backbone model except the delta models.
delta.freeze_module()
# visualize the change
delta.log()
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
```
<div class="admonition note">
<p class="title">**Hash checking**</p>
<p class="title">**Hash check**</p>
Since the delta model only works together with the backbone model.
we will automatically check whether you load the delta model the same way it is trained.
</p>
<p>
We calculate the trained model's [md5](http://some_link) and save it to the config. When finishing loading the delta model, we will re-calculate the md5 to see whether it changes.
<p> Note that performance is guaranteed by passing the hash check, but there are cases where the hash check is not passed but performance is still normal for various reasons. We are checking the reasons for this. Please consider this feature as a supplement. </p>
<p>Pass `check_hash=False` to disable the hash checking.</p>
</div>

View File

@ -10,9 +10,9 @@ model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
## STEP 2: Add delta modules
We provide two alternatives to add the delta modules.
### 2.1 Modification based on visualization
Suppose we want to make the feedforward layer of each block as our [modification target module](targetmodules),
We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
### 2.1 Visualize the backbone structure
Delta tuning's core change in the structure of the base model is to decorate (modify) the modules of the base model with small delta modules. We assume we want to treat the feedforward layer of each block as our [target modules](targetmodules). Since **different PLM name the submodules differently**,
We should first know the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
```python
from opendelta import Visualization
@ -43,26 +43,28 @@ delta_model.log() # This will visualize the backbone after modification and othe
:::{admonition} Try different positions
:class: tip
OpenDelta provide the flexibility to add delta to different positions on the backbone model. For example, If you want to move the adapter in the above example after the layer norm of the feed forward layer. The code should be changed into
OpenDelta provide the flexibility to add delta to various positions on the backbone model. For example, If you want to move the adapter in the above example after the layer norm of the feed forward layer. The code should be changed into
```python
# continue with the BART example, but not used later.
delta_model = AdapterModel(backbone_model=model, modified_modules=['final_layer_norm'], bottleneck_dim=12)
```
The performance may vary due to positional differences, but there is no academic guarantee that one will outperform the other.
The performance may vary due to positional differences, but there is currently theorectical guarantee that one will outperform the other.
:::
:::{admonition} Favored Configurations
:class: tip
Feel confused about the flexibility that OpenDelta brings? Currently you can refer to the papers for their configuration. And We will add [Favored Configurations](favoredconfiguration) soon.
Feel confused about the flexibility that OpenDelta brings? The default configuration is the `default_modified_modules` attributes of each Delta model. Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
- [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
- [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)
:::
## STEP 3: Freezing parameters
The main part of the backbone model is not automatically frozen (We may add the option in future). To freeze the main part of the backbone model except the trainable parts (usually the delta paramters), use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method. The `exclude` field obeys the same name-based addressing rules as the `modified_modules` field.
## STEP 3: Freeze parameters
So far the backbone model is still fully tunable. To freeze the main part of the backbone model except the trainable parts (usually the delta paramters), use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method. The syntax of `exclude` field also obeys the [name-based addressing](namebasedaddr) rules.
```python
# continue with the BART example
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"], set_state_dict=True)
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"])
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
@ -73,7 +75,14 @@ name: afterfreeze
---
```
````
The `set_state_dict=True` will tell the method to change the `state_dict` of the `backbone_model` to maintaining only the trainable parts.
Usually, we want to only save the trainable part, then we should modify the `state_dict` of the backbone model which original contains all the parameters. Now with `set_state_dict=True`, the `model.state_dict()` only contains the trainable parameters.
```python
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"], set_state_dict=True)
```
## STEP 4: Normal training pipeline
@ -83,21 +92,44 @@ The **model** then can be trained in traditional training scripts. Two things sh
:::{admonition} Note
:class: note
1. No need to change the optimizer, since the optimizer will only calculated and store gradient for those parameters with `requires_grad=True`, and the `requires_grad` attribute has been changed during the call to [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method.
2. `model.eval()` or `model.train()` should be used when needed to set dropout, etc. Delta model doesn't touch those configuration.
2. `model.eval()` or `model.train()` should be used if we need to enable/disable dropout. Opendelta doesn't touch those configuration.
:::
## STEP 5: Saved/Share the Delta Model
<img src="../imgs/hint-icon-2.jpg" height="30px"> *see [Save a delta model to local, or share with the community](saveload).*
## STEP 5: Save and load the Delta Model
### Option1: Use opendelta interface.
One option is to use our provided interface. This will save both the configurations of the delta model and the parameters of all trainable parameters.
```python
delta_model.save_finetuned("some_local_path/")
```
When loading the delta_model, just call the `from_finetuned` methods. Note that the loaded model is fully trainable. If you want to continue to train it, please use `freeze_module` again.
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
from opendelta import AutoDeltaModel
delta_model = AutoDeltaModel.from_finetuned("some_local_path/", backbone_model=model)
```
### Option2: Use pytorch interface.
Another option is to load the model using traditional pytorch ways.
```python
torch.save(model.state_dict(), "some_local_path/pytorch_model.bin")
```
Then load it into an initialied backbone model with delta model. Remember to use `strict=False` since now the state_dict contains only the trainable parameters.
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
from opendelta import AdapterModel
delta_model = AdapterModel(backbone_model=model, modified_modules=['fc2'], bottleneck_dim=12)
model.load_state_dict(torch.load("some_local_path/pytorch_model.bin"), strict=False)
```
### Option3: Save and upload to DeltaCenter.
You can also save the delta model to delta center to share with the community. See [instructions](deltacenter).
(favoredconfiguration)=
## Favored Configuration
Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
- [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
- [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)

View File

@ -0,0 +1,35 @@
# DeltaCenter
## Share to Delta Center.
```python
delta_model.save_finetuned("test_delta_model", push_to_dc = True)
```
## Download from Delta Center.
```python
# use tranformers as usual.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
# A running example
inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
```
Load delta model from delta center:
```python
# use existing delta models
from opendelta import AutoDeltaModel, AutoDeltaConfig
# use existing delta models from DeltaCenter
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
# freeze the whole backbone model except the delta models.
delta.freeze_module()
# visualize the change
delta.log()
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
```

View File

@ -1,97 +0,0 @@
# Save and Share the Delta
## Space efficient saving without changing the code.
After a modified backbone model is trained, you can save only trained part without change to any code, because **the state dict of the backbone model has been changed to the trainable parts**
```python
from opendelta import CompacterModel
from transformers import BertForMaskedLM
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
delta_model = CompacterModel(backbone_model) # modify the default modules.
# freeze module
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
# or
delta_model.freeze_module(exclude=["deltas"])
```
### save the checkpoint.
now save the backbone_model in normal way, and the checkpoint is **very space efficient**.
```python
# ...
# After some training pipeline
# ...
torch.save(backbone_model.state_dict(), "delta.ckpt")
# the checkpoint size
import os
print("checkpoint size: {:.2f}M".format(os.path.getsize("delta.ckpt")/1024**2))
# checkpoint size: 0.32M
```
### load the checkpoint.
In order to load the checkpoint, you should make sure the backbone model is a modified ones (so that it can take in the delta parameters).
Then load the checkpoint with `strict=False`.
```python
backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
# this will return long string of warning about the 'missing key'.
# if you want to supress it, use
# _ = backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
```
## Save/Load the entire model after training.
### save a delta model.
```python
delta_model.save_finetuned("delta_model")
# Configuration saved in delta_model/config.json
# Model weights saved in delta_model/pytorch_model.bin
```
This will save all the trained parameters and the configuration of the delta model to path `delta_model/`
### load a delta model.
```python
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
delta_model.from_finetuned("delta_model", backbone_model, local_files_only=True)
# passing local_files_only=True will save the time of checking in the web.
```
## Share or download a model to/from the community.
### Share.
```python
delta_model.save_finetuned("test_delta_model", push_to_hub = True)
```
### Download from community.
```python
from transformers import AutoModelForSeq2SeqLM
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
from opendelta import AutoDeltaModel
delta = AutoDeltaModel.from_finetuned("DeltaHub/lora_t5-base_mrpc", backbone_model=t5)
delta.log()
```
<div class="admonition tip">
<p class="title">**Push to Hub**</p>
<p> Currently we only provide the option to push to huggingface model hub.</p>
<p> Before push to hub, you may need to register an account on Huggingface. You can refer to this [tutorial about model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing)
</p>
<p> In some cases, your checkpoint is still large for git, please install [`git-lfs`](https://git-lfs.github.com).
</p>
</div>
:::{admonition} **Sharing with the Community**
:class: tip
If you are satisfied with your checkpoint, do not forget to share your model to <a href="https://huggingface.co/DeltaHub">DeltaHub</a>:
1. Add yourself to DeltaHub with the [public link](https://huggingface.co/organizations/DeltaHub/share/QzkBuLSmlVnNhQqHYnekoTXwSRkoRHBwZA)
2. Be sure to edit your model card to clearly illustrate the delta model before you share.
3. Click `setting` on the model
4. Transfer the model in `rename or transfer this model` section.
:::
## Save & Load for Composition of Delta
<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition](composition) of delta model. Please wait for future releases.

View File

@ -1,6 +1,7 @@
# Update Logs and Known Issues
## Version 0.3.2
- We improve the docs.
- We support BMTrain to accelerate the training, and parallelize the training of models that are hard to fit in a single GPU. Check [tutorial/2_with_bmtrain.py](https://github.com/thunlp/OpenDelta/tree/main/examples/tutorial/2_with_bmtrain.py)
- We add a functionality to [inspect the optimizer](https://github.com/thunlp/OpenDelta/tree/main/opendelta/utils/inspect.py). The user can see the number of trainable parameters in the optimizer and verify that opendelta is being used correctly.
- We move the functions to inspect the delta models into [inspect.py](https://github.com/thunlp/OpenDelta/tree/main/opendelta/utils/inspect.py)