temporary add

This commit is contained in:
shengdinghu 2022-10-20 10:16:05 +00:00
commit e0de6b02ad
180 changed files with 5407 additions and 1425 deletions

23
.gitignore vendored
View File

@ -35,4 +35,27 @@ log.txt
**/examples/examples_bmtrain/BMPretrain **/examples/examples_bmtrain/BMPretrain
**/examples/examples_bmtrain/BigModels/BigModels/results **/examples/examples_bmtrain/BigModels/BigModels/results
**/Delta_Memory/ **/Delta_Memory/
**/output/
**/thunlp/
**/saved_ckpts/
DeltaCenter-Python-Client/
backbone_structure
delta_checkpoints
gitop.sh
load_dataset_and_model.ipynb
load_model.py
scripts
t.py
t.sh
!examples/examples_prompt/configs/*/*.json
!examples/examples_prompt/configs/**
**/delta_checkpoints/
**/outputs/
**/unittest/**
!unittest/**.py
!unittest/**.sh

109
README.md
View File

@ -26,16 +26,18 @@
OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *delta tuning*), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs. OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *delta tuning*), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.
- Our repo is tested on Python 3.8 and PyTorch 1.9.0. Lower version may also be supported. - The latest version of OpenDelta is tested on Python==3.8.13, PyTorch==1.12.1, transformers==4.22.2. Other versions are likely to be supported as well. If you encounter bugs when using your own package versions, please raise an issue, we will look into it as soon as possible.
- **A demo of using Opendelta to modify the PLM (E.g., BART).** - **A demo of using Opendelta to modify the PLM (E.g., BART).**
![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif) ![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif)
## Updates ## News
- 2022.03.24 We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance. - **2022.10.14** Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-3-0)
- 2022.03.20 Add a [colab example](https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing) to illustrate efficient training and space-saving multitask-serving. - **2022.10.10** Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for [Delta Center](https://www.openbmb.org/toolKits/deltacenter), whose webpage is still under construction. Details can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-2-4)
- 2022.03.20 A new pip version released. - **2022.03.24** We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance.
- 2022.02.16 Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing. - **2022.03.20** Add a [colab example](https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing) to illustrate efficient training and space-saving multitask-serving.
- **2022.03.20** A new pip version released.
- **2022.02.16** Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing.
## Installation ## Installation
create a virtualenv (optional) create a virtualenv (optional)
@ -72,20 +74,95 @@ python setup.py install
python setup.py develop python setup.py develop
``` ```
## Must Try #### Tips
- If you want to use mirror for installing the packages, please change the `index_url` in [setup.cfg](setup.cfg)
```python - If you encounter network error using setup.py, please firstly install the dependencies via
from transformers import AutoModelForSeq2SeqLM ```shell
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large") pip install -r requirements.txt && python setup.py develop
from opendelta import AutoDeltaModel
delta = AutoDeltaModel.from_finetuned("thunlp/FactQA_T5-large_Adapter", backbone_model=t5)
delta.log()
``` ```
## Verified Supported Models ## Must Try
The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py)
```python
# use tranformers as usual.
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
# A running example
inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
# use existing delta models
from opendelta import AutoDeltaModel, AutoDeltaConfig
# use existing delta models from DeltaCenter
delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
# freeze the whole backbone model except the delta models.
delta.freeze_module()
# visualize the change
delta.log()
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
# Now save merely the delta models, not the whole backbone model, to tmp/
delta.save_finetuned(".tmp")
import os; os.listdir(".tmp")
# >>> The state dict size is 1.443 MB
# >>> We encourage users to push their final and public models to delta center to share them with the community!
# reload the model from local url and add it to pre-trained T5.
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
delta1 = AutoDeltaModel.from_finetuned(".tmp", backbone_model=t5)
import shutil; shutil.rmtree(".tmp") # don't forget to remove the tmp files.
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> <pad> Is Harry Potter written by JK Rowling?</s>
# detach the delta models, the model returns to the unmodified status.
delta1.detach()
t5_tokenizer.decode(t5.generate(inputs_ids)[0])
# >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
# use default configuration for cunstomized wrapped models which have PLMs inside. This is a common need for users.
import torch.nn as nn
class WrappedModel(nn.Module):
def __init__(self, inner_model):
super().__init__()
self.inner = inner_model
def forward(self, *args, **kwargs):
return self.inner(*args, **kwargs)
wrapped_model = WrappedModel(WrappedModel(t5))
# say we use LoRA
delta_config = AutoDeltaConfig.from_dict({"delta_type":"lora"})
delta2 = AutoDeltaModel.from_config(delta_config, backbone_model=wrapped_model)
delta2.log()
# >>> root
# -- inner
# -- inner
# ...
# ... lora_A:[8,1024], lora_B:[1024,8]
delta2.detach()
# use a not default configuration
# say we add lora to the last four layer of the decoder of t5, with lora rank=5
delta_config3 = AutoDeltaConfig.from_dict({"delta_type":"lora", "modified_modules":["[r]decoder.*((20)|(21)|(22)|(23)).*DenseReluDense\.wi"], "lora_r":5})
delta3 = AutoDeltaModel.from_config(delta_config3, backbone_model=wrapped_model)
delta3.log()
```
## Verified Default Configurations
- **You can try to use OpenDelta on *any* backbone models based on PyTorch.** - **You can try to use OpenDelta on *any* backbone models based on PyTorch.**
- However, with small chances thatThe interface of the submodules of the backbone model is not supported. Therefore we verified some commonly - However, with small chances that the interface of the submodules of the backbone model is not supported. Therefore we verified some commonly
used models that OpenDelta are sure to support. used models that OpenDelta are sure to support.
- We will keep testing more and more emerging models. - We will keep testing more and more emerging models.
@ -107,3 +184,5 @@ used models that OpenDelta are sure to support.

BIN
dist/opendelta-0.2.0-py3-none-any.whl vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.0.tar.gz vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.1-py3-none-any.whl vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.1.tar.gz vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.2-py3-none-any.whl vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.2.tar.gz vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.3-py3-none-any.whl vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.3.tar.gz vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.4-py3-none-any.whl vendored Normal file

Binary file not shown.

BIN
dist/opendelta-0.2.4.tar.gz vendored Normal file

Binary file not shown.

View File

@ -1,13 +1,17 @@
sphinx_copybutton sphinx_copybutton
sphinx_rtd_theme sphinx_rtd_theme
sphinx_toolbox sphinx_toolbox
torch myst_parser
transformers
sentencepiece==0.1.96 torch>=1.8.0
tqdm==4.62.2 transformers>=4.10.0
openprompt datasets==1.17.0
loralib sentencepiece>=0.1.96
tqdm>=4.62.2
decorator decorator
rich rich
myst_parser web.py
web.py gitpython
scipy # need?
sklearn # need?
delta_center_client==0.0.4

View File

@ -19,7 +19,9 @@ import datetime
import sphinx_rtd_theme import sphinx_rtd_theme
import doctest import doctest
import opendelta import opendelta
import opendelta.delta_models
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
@ -29,8 +31,8 @@ copyright = '{}, {}, Licenced under the Apache License, Version 2.0'.format(date
# The full version, including alpha/beta/rc tags # The full version, including alpha/beta/rc tags
release = '0.1.1' release = '0.3.1'
version = "0.1.1" version = "0.3.1"
html_theme = 'sphinx_rtd_theme' html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

View File

@ -1,7 +1,7 @@
OpenDelta's documentation! OpenDelta's documentation!
===================================== =====================================
OpenDelta is a **Plug-and-play** Library of the parameter-efficient fine-tuning ([delta-tuning](WhatisDelta)) technology for pre-trained models. [OpenDelta](https://github.com/thunlp/OpenDelta/) is a **Plug-and-play** Library of the parameter-efficient fine-tuning ([delta-tuning](WhatisDelta)) technology for pre-trained models.
## Essential Advantages: ## Essential Advantages:
@ -35,11 +35,18 @@ OpenDelta is a **Plug-and-play** Library of the parameter-efficient fine-tuning
notes/pluginunplug.md notes/pluginunplug.md
notes/acceleration.md notes/acceleration.md
notes/explored_config.md notes/explored_config.md
.. toctree::
:maxdepth: 1
:caption: Information
notes/citation.md notes/citation.md
notes/update.md
notes/faq.md
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Package Reference :caption: Documentation
modules/base modules/base
modules/deltas modules/deltas

View File

@ -1,3 +1,12 @@
# Citation # Citation
<img src="../imgs/todo-icon.jpeg" height="30px"> We are working on a technical report. If you find our repo useful, please cite the following paper.
```
@article{ding2022delta,
title={Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models},
author={Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and others},
journal={arXiv preprint arXiv:2203.06904},
year={2022}
}
```

View File

@ -1,10 +1,9 @@
(composition)=
# Composition of delta models # Composition of delta models
With OpenDelta, you can perform compostion of different delta models. With OpenDelta, you can perform compostion of different delta models.
### Add different deltas to the backbone ## Add different deltas to the backbone
``` ```
from transformers import AutoModelForSequenceClassification from transformers import AutoModelForSequenceClassification
@ -18,14 +17,14 @@ delta_model.log()
```{figure} ../imgs/composition_of_delta.png ```{figure} ../imgs/composition_of_delta.png
--- ---
width: 600px width: 600px
name: defaultmodification name: composition_of_delta
--- ---
``` ```
```` ````
### Even add multiple delta to the same layer ## Even add multiple delta to the same layer
``` ```
from transformers import AutoModelForSequenceClassification from transformers import AutoModelForSequenceClassification
@ -40,7 +39,7 @@ delta_model.log()
```{figure} ../imgs/multiple_to_one_layer.png ```{figure} ../imgs/multiple_to_one_layer.png
--- ---
width: 600px width: 600px
name: defaultmodification name: multiple_to_one_layer
--- ---
``` ```
```` ````

View File

@ -1,11 +1,7 @@
(favoredconfiguration)= (favoredconfiguration)=
# Favored Configuration # Favored Configuration
<img src="../imgs/todo-icon.jpeg" height="30px"> We will add the commonly used configuration of delta models HERE in future. Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
E.g. - [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
- the modified_modules (position of delta), - [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)
- hyperparameter that are the most efficient
- the favored composition between delta models
Currenlty, use the default setting, explore it by yourself, or refer to existing papers' configuration!

14
docs/source/notes/faq.md Normal file
View File

@ -0,0 +1,14 @@
# FAQs
1. **Why I encounder NotImplementedError in Prefix Tuning?**
This is because we find no easy way to get a unified Prefix Tuning implementation for different attention classes. If you really want to use Prefix Tuning for the models we have not supported, you can implement the ``PrefixLayerYOURMODEL`` on your own or raise a issue to request the feature for your model.
2. **Available Models with default configurations are ..., Please manually add the delta models by speicifying 'modified_modules' based on the visualization of your model structure**
Although most pre-trained models (PTMs) use the transformers archtecture, they are implemented differently. For example, the attention module in GPT2 and BERT is not only named differently, but also implemented in different ways. Common structure mapping mapps the different name conventions of different PTMs into a unified name convention. But there are many PTMs that we do not currently cover. But don't worry! For these models, you can figure out which modules should you modify by simply [visualizing the PTMs](visualization), and then specify the `modified modules` manually (See [name-based addressing](namebasedaddr)).
3. **Requires a dummy_inputs to be passed through the model to understand the dimensionality of each tensor in the computation graph. The {module.__class__.__name__} Class has no dummy_inputs, and automatically created dummy_inputs failed.**
The `dummy_inputs` can be any data that make `backbone_model.forward(**dummy_inputs)` succeed. Only the form and shape of the `dummy_inputs` matter. To set dummy_inputs for your model, please use: `setattr(backbone_model, 'dummy_inputs', some_dummy_inputs)` before initializing `{self.__class__.__name__}`.

View File

@ -38,7 +38,7 @@ We use three key functions to achieve the modifications to the backbone model ou
- **parallel insertion** - **parallel insertion**
Adapters can also be used in a parallel fashion (see [Paper](https://arxiv.org/abs/2110.04366)). Adapters can also be used in a parallel fashion (see [Paper](https://arxiv.org/abs/2110.04366)).
For these methods, use [insert_parallel_module](opendelta.basemodel.DeltaBase.insert_parrellel_module) interface. For these methods, use [insert_parallel_module](opendelta.basemodel.DeltaBase.insert_parallel_module) interface.
:::{admonition} Doc-preserving Insertion :::{admonition} Doc-preserving Insertion

View File

@ -1,4 +1,4 @@
(namebasedaddr)=
# Name-based Addressing # Name-based Addressing
Named based addressing is what set OpenDelta apart from other packages and provide the possibility to be used to a broader range of models (even emerging ones). Named based addressing is what set OpenDelta apart from other packages and provide the possibility to be used to a broader range of models (even emerging ones).
@ -52,7 +52,7 @@ In this case, string `"name_b.0.name_a"` will be the name to address the submodu
Thus when applying a delta model to this toy net. Thus when applying a delta model to this toy net.
``` ```python
from opendelta import AdapterModel from opendelta import AdapterModel
AdapterModel(backbone_model=root, modified_modules=['name_b.0.name_a']) AdapterModel(backbone_model=root, modified_modules=['name_b.0.name_a'])
Visualization(root).structure_graph() Visualization(root).structure_graph()
@ -67,7 +67,7 @@ name: toy-delta
``` ```
```` ````
(targetmodules)=
## Target modules. ## Target modules.
For different delta methods, the operation for the modification target is different. For different delta methods, the operation for the modification target is different.
@ -88,7 +88,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
1. **End-matching** Rules. 1. **End-matching** Rules.
OpenDelta will take every modules that OpenDelta will take every modules that
**ends with** the provided name suffix as the modification [target module](target_module). **ends with** the provided name suffix as the modification [target module](targetmodules).
:::{admonition} Example :::{admonition} Example
:class: tip :class: tip
Taking DistilBert with an classifier on top as an example: Taking DistilBert with an classifier on top as an example:
@ -115,7 +115,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
:::{admonition} Regex in Json Configs :::{admonition} Regex in Json Configs
:class: warning :class: warning
In json, you should write `"\\."` instead of `"\."` for a real dot due to json parsing rules. That is In json, you should write `"\\."` instead of `"\."` for a real dot due to json parsing rules. That is
```json ```
{ {
... ...
"modified_moduls": ['[r][0-5]\\.attention'], "modified_moduls": ['[r][0-5]\\.attention'],
@ -138,7 +138,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
delta_model = LoraModel(backbone_model=model, interactive_modify=True) delta_model = LoraModel(backbone_model=model, interactive_modify=True)
``` ```
by setting `interactive_modify`, a web server will be opened on local host, and the link will be print in the terminal. by setting `interactive_modify`, a web server will be opened on local host, and the link will be print in the terminal, e.g.,
``` ```
http://0.0.0.0:8888/ http://0.0.0.0:8888/

View File

@ -19,7 +19,7 @@ delta_model.log()
```{figure} ../imgs/plugunplug1.png ```{figure} ../imgs/plugunplug1.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug1
--- ---
``` ```
```` ````
@ -33,7 +33,7 @@ delta_model.log()
```{figure} ../imgs/plugunplug2.png ```{figure} ../imgs/plugunplug2.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug2
--- ---
``` ```
```` ````
@ -48,7 +48,7 @@ delta_model.log()
```{figure} ../imgs/plugunplug3.png ```{figure} ../imgs/plugunplug3.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug3
--- ---
``` ```
```` ````
@ -67,7 +67,7 @@ delta_model2.log()
```{figure} ../imgs/plugunplug4.png ```{figure} ../imgs/plugunplug4.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug4
--- ---
``` ```
```` ````
@ -81,7 +81,7 @@ delta_model.log()
```{figure} ../imgs/plugunplug5.png ```{figure} ../imgs/plugunplug5.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug5
--- ---
``` ```
```` ````
@ -96,7 +96,7 @@ delta_model.log()
```{figure} ../imgs/plugunplug6.png ```{figure} ../imgs/plugunplug6.png
--- ---
width: 800px width: 800px
name: defaultmodification name: plugunplug6
--- ---
``` ```
```` ````

View File

@ -1,4 +1,3 @@
(saveload)=
# Save and Share the Delta # Save and Share the Delta
## Space efficient saving without changing the code. ## Space efficient saving without changing the code.
@ -95,4 +94,4 @@ If you are satisfied with your checkpoint, do not forget to share your model to
## Save & Load for Composition of Delta ## Save & Load for Composition of Delta
<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition of delta model](compositon). Please wait for future releases. <img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition](composition) of delta model. Please wait for future releases.

View File

@ -1,4 +1,4 @@
(unifyname)= (commonstructure)=
# Common Structure Mapping # Common Structure Mapping
@ -41,7 +41,7 @@ Visualize bert-base using a common structure name: The submodules that are not c
```{figure} ../imgs/commonstructure_vis.png ```{figure} ../imgs/commonstructure_vis.png
:width: 600px :width: 600px
:name: transformers_structure :name: commonstructure_vis
``` ```
(mappingexample)= (mappingexample)=

View File

@ -0,0 +1,29 @@
# Update Logs and Known Issues
## Version 0.3.1
- We update [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py) for a simple introduction of the core functionality of OpenDelta.
- Thanks to [Weilin Zhao](https://github.com/Achazwl) We merge a long-developed branch parallel_adapter into the main branch.
## Version 0.3.0
### Updates:
- Add this changelog for a granular record of updates.
- The default configuration of delta models can be applied to more wrapped models.
- There is less need to configure 'modified_modules' for wrapped models like [BertForSequenceClassification](https://huggingface.co/docs/transformers/main/en/model_doc/bert#transformers.BertForSequenceClassification) or even [OpenMatch.DRModel](https://github.com/OpenMatch/OpenMatch/blob/master/src/openmatch/modeling/dense_retrieval_model.py#L37), as long as it has a model we support default configuration inside. **Note that if you customize `modified_modules` by yourself, most pytorch models are supported.**
- LoRA and BitFit models now does not need pseudo data to instantiate the model.
- BitFit models can now support [Conv1D](https://huggingface.co/docs/transformers/v4.23.1/en/internal/modeling_utils#transformers.Conv1D) using default configuration.
- Improve type hint for AutoDeltaModel.
- Fix bugs in documentation.
- Fix small bugs when saving a model without a config attributes.
- Make the default modified modules of adapter-like methods more accurate: attach the adapter-like modules after the output of attention layer and second feed-forward layer, both before the layernorm layers.
- A simple unit test folder containing development-time tests has been added for interested users.
### Known Issues
- SoftPrompt is still not supported for wrapped model if the model has no attribute `get_input_embeddings`.
- Prefix Tuning is still limited to T5, GPT2, Bart, Bert, Roberta.
## Version 0.2.4
### Updates
- examples/examples_seq2seq and examples/examples_text-classification is depreciated and moved to [legacy](https://github.com/thunlp/OpenDelta/tree/main/examples/legacies)
- Thanks to [Zhen Zhang](https://github.com/namezhenzhang), we provide [examples_prompt](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt), as a cleaner and more general framework, which unifies the delta tuning paradigm and the prompt-tuning paradigm. It is still based on [Huggingface Trainers](https://huggingface.co/docs/transformers/main_classes/trainer). In this example framework, the running pipeline is [a unified script](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/src), the differences in tasks, models, delta tuning models, and even prompt-tuning paradigms are [more modular and be more independent ](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/backbones). Please try it out!

View File

@ -12,7 +12,7 @@ model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
## STEP 2: Add delta modules ## STEP 2: Add delta modules
We provide two alternatives to add the delta modules. We provide two alternatives to add the delta modules.
### 2.1 Modification based on visualization ### 2.1 Modification based on visualization
Suppose we want to make the feedforward layer of each block as our [modification target module](target_module), Suppose we want to make the feedforward layer of each block as our [modification target module](targetmodules),
We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).* We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
```python ```python
@ -48,7 +48,7 @@ delta_model.log() # This will visualize the backbone after modification and othe
### 2.2 Use the default modification. ### 2.2 Use the default modification.
We also provide the default modifications of each delta methods for some commonly used PTMs (e.g., BERT, RoBERTA, DistilBERT, T5, GPT2), so the users don't need to specify the submodules to modify. We also provide the default modifications of each delta methods for some commonly used PTMs (e.g., BERT, RoBERTA, DistilBERT, T5, GPT2), so the users don't need to specify the submodules to modify.
The default modifications is achieved by mapping a name of a submodule to it's name on a common transformer structure. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For details about the common structure mapping, see [Common Structure Mapping](unifyname)* The default modifications is achieved by mapping a name of a submodule to it's name on a common transformer structure. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For details about the common structure mapping, see [Common Structure Mapping](commonstructure)*

View File

@ -1,4 +1,3 @@
(visualization)=
# Visualize the Parameters # Visualize the Parameters
When OpenDelta makes modifications to a pretrained model (PTM), it is beneficial to know what your PTM looks like, especially the location of the parameters. When OpenDelta makes modifications to a pretrained model (PTM), it is beneficial to know what your PTM looks like, especially the location of the parameters.

View File

@ -1,24 +1,59 @@
# !!!!This example collection is still under develop, please wait for some time to use it. # Examples of using opendelta together with 🤗 transformers.
## install the repo In this repo, we construct a very general pipeline to train and test a PLM using
🤗 transformers.
The pipeline was constructed together with [openpromptu](https://pypi.org/project/openpromptu/), which is a light and
model-agnostic version of [openprompt](https://github.com/thunlp/OpenPrompt).
## Pool of PLMs
We are going to adapt most of the models in 🤗 transformers
in the repos. The different pipeline, processing, or configurations are specified
in `./backbones/`. You can add your own model in this file to support customized models.
### A example script to run the repo in offline mode
```bash ```bash
cd ../ conda activate [YOURENV]
python setup_seq2seq.py develop PATHBASE=[YOURPATH]
JOBNAME="adapter_t5-base"
DATASET="superglue-cb"
cd $PATHBASE/OpenDelta/examples/examples_prompt/
python configs/gen_t5.py --job $JOBNAME
export TRANSFORMERS_OFFLINE=1
export HF_DATASETS_OFFLINE=1
python src/run.py configs/$JOBNAME/$DATASET.json \
--model_name_or_path [YOURPATH_TO_T5_BASE] \
--tokenizer_name [YOURPATH_TO_T5_BASE] \
--datasets_saved_path [YOURPATH_TO_CB_DATASETS] \
--finetuned_delta_path ${PATHBASE}/delta_checkpoints/ \
--num_train_epochs 20 \
--bottleneck_dim 24 \
--delay_push True
``` ```
This will add `examples_seq2seq` to the environment path of the python lib.
## Generating the json configuration file ## A example of quick testing the repo.
```shell ```bash
python configs/gen_$BACKBONETYPE.py --job $YOURJOB conda activate [YOURENV]
#e.g. python configs/gen_beit.py --job lora_beit-base-patch16-224 PATHBASE=[YOURPATH]
```
The available job configuration (e.g., `--job lora_beit-base-patch16-224`) can be seen from the scripts. You can also
create your only configuration.
JOBNAME="adapter_t5-base"
DATASET="superglue-cb"
## Run the code cd $PATHBASE/OpenDelta/examples/examples_prompt/
``` export TRANSFORMERS_OFFLINE=1
CUDA_VISIBLE_DEVICES=1 python src/run.py configs/lora_beit-base-patch16-224/beans.json export HF_DATASETS_OFFLINE=1
``` export DELTACENTER_OFFLINE=0
python src/test.py configs/$JOBNAME/$DATASET.json \
--model_name_or_path [YOURPATH_TO_T5_BASE] \
--tokenizer_name [YOURPATH_TO_T5_BASE] \
--datasets_saved_path [YOURPATH_TO_CB_DATASETS] \
--finetuned_delta_path thunlp/t5-base_adapter_superglue-cb_20220701171436c80 \
--delta_cache_dir "./delta_checkpoints/" \
--force_download True
```

View File

@ -26,14 +26,14 @@ def preprocess_function(raw_example, **kwargs):
example = InputExample(**raw_example) example = InputExample(**raw_example)
try:
example = verbalizer.wrap_one_example(example) example = verbalizer.wrap_one_example(example)
example, other = template.wrap_one_example(example) example, other = template.wrap_one_example(example)
input_sentence = tokenizer_wrapper.merge_wrapped_example(example) input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
model_inputs = tokenizer(input_sentence, max_length=256, model_inputs = tokenizer(input_sentence, max_length=256,
padding="max_length", truncation=True) padding="max_length", truncation=True)
except:
from IPython import embed; embed(header="Therer")
with tokenizer.as_target_tokenizer(): with tokenizer.as_target_tokenizer():
label = tokenizer(other['tgt_text']).input_ids label = tokenizer(other['tgt_text']).input_ids
@ -43,7 +43,8 @@ def preprocess_function(raw_example, **kwargs):
def get_backbone(model_args, **kwargs): def get_backbone(model_args, **kwargs):
config = AutoConfig.from_pretrained( config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path, # model_args.config_name if model_args.config_name else model_args.model_name_or_path,
model_args.model_name_or_path,
cache_dir=model_args.cache_dir, cache_dir=model_args.cache_dir,
revision=model_args.model_revision, revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None, use_auth_token=True if model_args.use_auth_token else None,

View File

@ -8,7 +8,6 @@ from transformers import (
AutoFeatureExtractor, AutoFeatureExtractor,
AutoModelForImageClassification, AutoModelForImageClassification,
) )
from transformers import ViTFeatureExtractor
from transformers import Trainer as HfTrainer from transformers import Trainer as HfTrainer
import torch.nn as nn import torch.nn as nn
@ -26,9 +25,10 @@ def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
def preprocess_function(raw_example, **kwargs): def preprocess_function(raw_example, **kwargs):
# from IPython import embed; embed(header="Therefa") # from IPython import embed; embed(header="Therefa")
tokenizer = kwargs['tokenizer'] tokenizer = kwargs['tokenizer']
model_inputs = tokenizer(raw_example['image'], return_tensors='pt') # print(np.array(raw_example['img']).shape)
model_inputs = tokenizer(np.array(raw_example['image']), return_tensors='pt')
model_inputs['pixel_values'] = model_inputs['pixel_values'].squeeze() model_inputs['pixel_values'] = model_inputs['pixel_values'].squeeze()
model_inputs['labels'] = raw_example['labels'] model_inputs['labels'] = raw_example['label']
return model_inputs return model_inputs
def compute_metrics(eval_preds, dataset_name, eval_metric): def compute_metrics(eval_preds, dataset_name, eval_metric):
@ -55,7 +55,7 @@ def mask_token_func(tokenizer, ith_mask=0):
def get_remove_columns(dataset_features): def get_remove_columns(dataset_features):
# dataset_features.pop("label") # dataset_features.pop("label")
print("remove_columns: {}".format(dataset_features)) # print("remove_columns: {}".format(dataset_features))
return dataset_features return dataset_features
class DataCollator(HfDataCollatorMixin): class DataCollator(HfDataCollatorMixin):

View File

@ -0,0 +1,169 @@
from openpromptu.data_utils import InputExample
import torch
from transformers.data.data_collator import torch_default_data_collator
from transformers.data.data_collator import DataCollatorMixin as HfDataCollatorMixin
from transformers.data.data_collator import DataCollatorForSeq2Seq as DataCollator
import numpy as np
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
)
from transformers import Seq2SeqTrainer as HfSeq2SeqTrainer
import copy
from torch.nn import CrossEntropyLoss
def preprocess_function(raw_example, **kwargs):
tokenizer = kwargs['tokenizer']
data_args = kwargs['data_args']
template = kwargs['template']
verbalizer = kwargs['verbalizer']
tokenizer_wrapper = kwargs['tokenizer_wrapper']
example = InputExample(**raw_example)
# example = verbalizer.wrap_one_example(example)
example, other = template.wrap_one_example(example)
input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
padding="max_length", truncation=True)
return model_inputs
def compute_metrics(eval_preds, dataset_name, eval_metric):
pass
def mask_token_func(tokenizer, ith_mask=0):
return tokenizer.pad_token
def get_remove_columns(dataset_features):
# dataset_features.remove("label")
return dataset_features
def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
from openpromptu.prompts import GenerationVerbalizer
from openpromptu.prompts import ManualTemplate
from openpromptu import TokenizerWrapper
template = ManualTemplate(text = task.templates_text[template_id])
verbalizer = GenerationVerbalizer(tokenizer=tokenizer, classes = None, label_words=None)
tokenizer_wrapper = TokenizerWrapper(max_seq_length=data_args.max_source_length, tokenizer=tokenizer, truncate_method="balanced", mask_token_func=mask_token_func)
return template, verbalizer, tokenizer_wrapper
def get_backbone(model_args, **kwargs):
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
# config.dropout_rate = 0.0
tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=model_args.use_fast_tokenizer,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model = AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
return config, tokenizer, model
class Trainer(HfSeq2SeqTrainer):
def __init__(self, verbalizer=None, eval_task=None, **kwargs):
super().__init__(**kwargs)
self.eval_task = eval_task
self.compute_metrics = self._compute_metrics
def compute_loss(self, model, inputs, return_outputs=False):
labels=copy.deepcopy(inputs['input_ids'])
# labels[labels==self.tokenizer.pad_token_id]=-100
outputs = model(**inputs)
logits = outputs.logits
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.long().view(-1))
return (loss, outputs) if return_outputs else loss
def prediction_step(
self,
model, #nn.Module,
inputs, #Dict[str, Union[torch.Tensor, Any]],
prediction_loss_only, #: bool,
ignore_keys, #: Optional[List[str]] = None,
): #-> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
"""
Perform an evaluation step on :obj:`model` using obj:`inputs`.
Subclass and override to inject custom behavior.
Args:
model (:obj:`nn.Module`):
The model to evaluate.
inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
The inputs and targets of the model.
The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
argument :obj:`labels`. Check your model's documentation for all accepted arguments.
prediction_loss_only (:obj:`bool`):
Whether or not to return the loss only.
Return:
Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
labels (each being optional).
"""
if not self.args.predict_with_generate or prediction_loss_only:
return super().prediction_step(
model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
)
inputs = self._prepare_inputs(inputs)
with torch.no_grad():
labels=copy.deepcopy(inputs['input_ids'])
# labels[labels==self.tokenizer.pad_token_id]=-100
outputs = model(**inputs)
logits = outputs.logits
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous().long()
loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1)).detach().cpu()
loss = torch.where(torch.isnan(loss), torch.full_like(loss, 0), loss)
if prediction_loss_only:
return (loss, None, None)
else:
# non pad label
shift_labels = shift_labels.view(-1).detach().cpu()
nonpad_idx = shift_labels!=self.tokenizer.pad_token_id
shift_labels = shift_labels[nonpad_idx]
# the probability at the corresponding position
shift_logits = shift_logits.view(-1, shift_logits.shape[-1])[nonpad_idx].detach().cpu()
target_position = torch.nn.functional.one_hot(shift_labels,shift_logits.shape[-1]).bool().to(shift_labels.device)
shift_logits = shift_logits.softmax(dim=-1)[target_position]
return (loss, shift_logits, shift_labels)
def _compute_metrics(self, eval_preds):
preds, labels = eval_preds
result = {}
for metric in self.eval_task.metric:
result.update(metric(preds, labels,ignore_index=self.tokenizer.pad_token_id))
average_metric = sum(result.values())/len(result)
result.update({"average_metrics":average_metric})
return result

View File

@ -26,14 +26,13 @@ def preprocess_function(raw_example, **kwargs):
example = InputExample(**raw_example) example = InputExample(**raw_example)
try:
example = verbalizer.wrap_one_example(example) example = verbalizer.wrap_one_example(example)
example, other = template.wrap_one_example(example) example, other = template.wrap_one_example(example)
input_sentence = tokenizer_wrapper.merge_wrapped_example(example) input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length, model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
padding="max_length", truncation=True) padding="max_length", truncation=True)
except:
from IPython import embed; embed(header="Therer")
with tokenizer.as_target_tokenizer(): with tokenizer.as_target_tokenizer():
label = tokenizer(other['tgt_text']).input_ids label = tokenizer(other['tgt_text']).input_ids
@ -165,7 +164,7 @@ class Trainer(HfSeq2SeqTrainer):
return (loss, generated_tokens, labels) return (loss, generated_tokens, labels)
def _compute_metrics(self, eval_preds): def _compute_metrics(self, eval_preds):
from IPython import embed; embed(header="In compute metrics") # from IPython import embed; embed(header="In compute metrics")
preds, labels = eval_preds preds, labels = eval_preds
decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True) decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True) decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)

View File

@ -0,0 +1,171 @@
from openpromptu.data_utils import InputExample
import torch
from transformers.data.data_collator import torch_default_data_collator
from transformers.data.data_collator import DataCollatorMixin as HfDataCollatorMixin
from transformers.data.data_collator import DataCollatorForSeq2Seq as DataCollator
import numpy as np
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
)
from transformers import Seq2SeqTrainer as HfSeq2SeqTrainer
import copy
from torch.nn import CrossEntropyLoss
def preprocess_function(raw_example, **kwargs):
tokenizer = kwargs['tokenizer']
data_args = kwargs['data_args']
template = kwargs['template']
verbalizer = kwargs['verbalizer']
tokenizer_wrapper = kwargs['tokenizer_wrapper']
example = InputExample(**raw_example)
# example = verbalizer.wrap_one_example(example)
example, other = template.wrap_one_example(example)
input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
padding="max_length", truncation=True)
return model_inputs
def compute_metrics(eval_preds, dataset_name, eval_metric):
pass
def mask_token_func(tokenizer, ith_mask=0):
return tokenizer.pad_token
def get_remove_columns(dataset_features):
# dataset_features.remove("label")
return dataset_features
def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
from openpromptu.prompts import GenerationVerbalizer
from openpromptu.prompts import ManualTemplate
from openpromptu import TokenizerWrapper
template = ManualTemplate(text = task.templates_text[template_id])
verbalizer = GenerationVerbalizer(tokenizer=tokenizer, classes = None, label_words=None)
tokenizer_wrapper = TokenizerWrapper(max_seq_length=data_args.max_source_length, tokenizer=tokenizer, truncate_method="tail", mask_token_func=mask_token_func)
return template, verbalizer, tokenizer_wrapper
def get_backbone(model_args, **kwargs):
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
# config.dropout_rate = 0.0
tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=model_args.use_fast_tokenizer,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
if not hasattr(tokenizer,"pad_token") or (hasattr(tokenizer,"pad_token") and tokenizer.pad_token==None):
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
return config, tokenizer, model
class Trainer(HfSeq2SeqTrainer):
def __init__(self, verbalizer=None, eval_task=None, **kwargs):
super().__init__(**kwargs)
self.eval_task = eval_task
self.compute_metrics = self._compute_metrics
def compute_loss(self, model, inputs, return_outputs=False):
labels=copy.deepcopy(inputs['input_ids'])
# labels[labels==self.tokenizer.pad_token_id]=-100
outputs = model(**inputs)
logits = outputs.logits
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.long().view(-1))
return (loss, outputs) if return_outputs else loss
def prediction_step(
self,
model, #nn.Module,
inputs, #Dict[str, Union[torch.Tensor, Any]],
prediction_loss_only, #: bool,
ignore_keys, #: Optional[List[str]] = None,
): #-> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
"""
Perform an evaluation step on :obj:`model` using obj:`inputs`.
Subclass and override to inject custom behavior.
Args:
model (:obj:`nn.Module`):
The model to evaluate.
inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
The inputs and targets of the model.
The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
argument :obj:`labels`. Check your model's documentation for all accepted arguments.
prediction_loss_only (:obj:`bool`):
Whether or not to return the loss only.
Return:
Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
labels (each being optional).
"""
if not self.args.predict_with_generate or prediction_loss_only:
return super().prediction_step(
model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
)
inputs = self._prepare_inputs(inputs)
with torch.no_grad():
labels=copy.deepcopy(inputs['input_ids'])
# labels[labels==self.tokenizer.pad_token_id]=-100
outputs = model(**inputs)
logits = outputs.logits
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous().long()
loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1)).detach().cpu()
loss = torch.where(torch.isnan(loss), torch.full_like(loss, 0), loss)
if prediction_loss_only:
return (loss, None, None)
else:
# non pad label
shift_labels = shift_labels.view(-1).detach().cpu()
nonpad_idx = shift_labels!=self.tokenizer.pad_token_id
shift_labels = shift_labels[nonpad_idx]
# the probability at the corresponding position
shift_logits = shift_logits.view(-1, shift_logits.shape[-1])[nonpad_idx].detach().cpu()
target_position = torch.nn.functional.one_hot(shift_labels,shift_logits.shape[-1]).bool().to(shift_labels.device)
shift_logits = shift_logits.softmax(dim=-1)[target_position]
return (loss, shift_logits, shift_labels)
def _compute_metrics(self, eval_preds):
preds, labels = eval_preds
result = {}
for metric in self.eval_task.metric:
result.update(metric(preds, labels,ignore_index=self.tokenizer.pad_token_id))
average_metric = sum(result.values())/len(result)
result.update({"average_metrics":average_metric})
return result

View File

@ -26,14 +26,13 @@ def preprocess_function(raw_example, **kwargs):
example = InputExample(**raw_example) example = InputExample(**raw_example)
try:
example = verbalizer.wrap_one_example(example) example = verbalizer.wrap_one_example(example)
example, other = template.wrap_one_example(example) example, other = template.wrap_one_example(example)
input_sentence = tokenizer_wrapper.merge_wrapped_example(example) input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
model_inputs = tokenizer(input_sentence, max_length=256, model_inputs = tokenizer(input_sentence, max_length=256,
padding="max_length", truncation=True) padding="max_length", truncation=True)
except:
from IPython import embed; embed(header="Therer")
with tokenizer.as_target_tokenizer(): with tokenizer.as_target_tokenizer():
label = tokenizer(other['tgt_text']).input_ids label = tokenizer(other['tgt_text']).input_ids

View File

@ -1,59 +0,0 @@
# the final results will be populated here.{
"evaluate": {
"epoch": 20.0,
"eval_accuracy": 89.2156862745098,
"eval_average_metrics": 90.76168929110105,
"eval_f1": 92.3076923076923,
"eval_loss": 0.16493959724903107,
"eval_runtime": 1.6391,
"eval_samples_per_second": 124.455
},
"repo_name": "DeltaHub/bitfit_t5-base_mrpc",
"test": {
"epoch": 20.0,
"test_accuracy": 88.23529411764706,
"test_average_metrics": 89.97971602434077,
"test_f1": 91.72413793103448,
"test_loss": 0.14968213438987732,
"test_runtime": 1.6344,
"test_samples_per_second": 124.82
}
}
{
"evaluate": {
"epoch": 20.0,
"eval_average_metrics": 52.10265668831534,
"eval_loss": 0.3603779077529907,
"eval_matthews_correlation": 52.10265668831534,
"eval_runtime": 1.0808,
"eval_samples_per_second": 482.046
},
"repo_name": "DeltaHub/bitfit_t5-base_cola",
"test": {
"epoch": 20.0,
"test_average_metrics": 54.209563471221934,
"test_loss": 0.2853100299835205,
"test_matthews_correlation": 54.209563471221934,
"test_runtime": 1.056,
"test_samples_per_second": 494.304
}
}
{
"evaluate": {
"epoch": 20.0,
"eval_average_metrics": 53.80613287067274,
"eval_loss": 0.25723716616630554,
"eval_matthews_correlation": 53.80613287067274,
"eval_runtime": 1.0583,
"eval_samples_per_second": 492.299
},
"repo_name": "DeltaHub/bitfit_t5-base_cola",
"test": {
"epoch": 20.0,
"test_average_metrics": 54.32497579543861,
"test_loss": 0.22327613830566406,
"test_matthews_correlation": 54.32497579543861,
"test_runtime": 1.0556,
"test_samples_per_second": 494.507
}
}

View File

@ -0,0 +1,48 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "beans",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/clip-vit-base-patch32",
"num_classes": 3,
"num_train_epochs": 20,
"output_dir": "outputs/adapter/clip-vit-base-patch32/beans",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_delta_center": true,
"push_to_hub": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "beans",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "beans",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/clip-vit-base-patch32",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,53 @@
{
"backbone_model": "opt",
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "wikitext",
"eval_steps": 200,
"evaluation_strategy": "steps",
"gradient_accumulation_steps":2,
"greater_is_better": false,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 900,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/opt-350m",
"model_path_public": "opt-350m",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/opt-350m/wikitext",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 6,
"per_device_train_batch_size": 6,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "wikitext",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "wikitext",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/opt-350m",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["self_attn"]
}

View File

@ -0,0 +1,53 @@
{
"backbone_model": "vit",
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": false,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "beans",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/vit-large-patch16-224-in21k",
"model_path_public": "vit-large-patch16-224-in21k",
"num_classes": 3,
"num_train_epochs": 20,
"output_dir": "outputs/adapter/vit-large-patch16-224-in21k/beans",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": false,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "beans",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "beans",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/vit-large-patch16-224-in21k",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["output"]
}

View File

@ -0,0 +1,51 @@
{
"backbone_model": "t5-large",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "rte",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/t5-large",
"model_path_public": "t5-large",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-large/rte",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "rte",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "rte",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/t5-large",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["attn", "ff", "layer_norm"]
}

View File

@ -0,0 +1,66 @@
{
"backbone_model": "blenderbot",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "compacter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "sst2",
"eval_steps": 200,
"evaluation_strategy": "steps",
"factorized_phm": true,
"factorized_phm_rule": false,
"gradient_clip": false,
"greater_is_better": true,
"hypercomplex_adapters": true,
"hypercomplex_division": 4,
"hypercomplex_nonlinearity": "glorot-uniform",
"learn_phm": true,
"learning_rate": 0.003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/blenderbot-3b",
"model_path_public": "blenderbot-3b",
"non_linearity": "gelu_new",
"normalize_phm_weight": false,
"num_train_epochs": 3,
"output_dir": "outputs/compacter/blenderbot-3b/sst2",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"phm_c_init": "normal",
"phm_clamp": false,
"phm_init_range": 0.0001,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"shared_phm_rule": false,
"split_validation_test": true,
"task_name": "sst2",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "sst2",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/blenderbot-3b",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"use_bias_down_sampler": true,
"use_bias_up_sampler": true,
"warmup_steps": 0,
"modified_modules":["fc2"]
}

View File

@ -0,0 +1,51 @@
{
"backbone_model": "deberta-v2-xlarge",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "compacter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "mnli",
"eval_steps": 500,
"evaluation_strategy": "steps",
"greater_is_better": true,
"is_seq2seq": false,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/deberta-v2-xlarge",
"num_train_epochs": 3,
"output_dir": "outputs/compacter/deberta-v2-xlarge/mnli",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": false,
"push_to_dc": true,
"push_to_hub": false,
"save_steps": 500,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "mnli",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "mnli",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/deberta-v2-xlarge",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["attention"]
}

View File

@ -0,0 +1,51 @@
{
"backbone_model": "long-t5",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "compacter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "rte",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/long-t5-tglobal-large",
"model_path_public": "long-t5-tglobal-large",
"num_train_epochs": 20,
"output_dir": "outputs/compacter/long-t5-tglobal-large/rte",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "rte",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "rte",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/long-t5-tglobal-large",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["attn", "ff", "layer_norm"]
}

View File

@ -71,8 +71,21 @@ AllConfigs['adapter_bart-base'].update({
"output_dir": "outputs/adapter/bart-base/", "output_dir": "outputs/adapter/bart-base/",
}) })
AllConfigs['lora_bart-base'] = copy.deepcopy(BaseConfigs['bart-base']) AllConfigs['parallel_adapter_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['lora_bart-base'].update({ AllConfigs['parallel_adapter_t5-base'].update({
"delta_type": "parallel_adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"bottleneck_dim":24,
"output_dir": "outputs/parallel_adapter/t5-base/",
})
AllConfigs['lora_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['lora_t5-base'].update({
"delta_type": "lora", "delta_type": "lora",
"learning_rate": 3e-4, "learning_rate": 3e-4,
"unfrozen_modules": [ "unfrozen_modules": [

View File

@ -2,7 +2,7 @@ import collections
import copy import copy
PATHBASE="/mnt/sfs_turbo/hsd/plm_cache/" PATHBASE="/mnt/sfs_turbo/hsd/plm_cache/"
PATHBASE="/home/hushengding/plm_cache/" # PATHBASE="/home/hushengding/plm_cache/"
AllConfigs = {} AllConfigs = {}

View File

@ -45,11 +45,14 @@ BaseConfigs['t5-base'] = {
"greater_is_better": True, "greater_is_better": True,
"evaluation_strategy": "steps", "evaluation_strategy": "steps",
"overwrite_output_dir": True, "overwrite_output_dir": True,
"push_to_hub": False, "push_to_hf": False,
"push_to_delta_center": True, "push_to_dc": True,
"save_strategy": "steps", "save_strategy": "steps",
"datasets_load_from_disk": True, "datasets_load_from_disk": True,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/" "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"backbone_model": "t5", # use in delta center,
"model_path_public": "t5-base", # use in delta center,
} }
AllConfigs['bitfit_t5-base'] = copy.deepcopy(BaseConfigs['t5-base']) AllConfigs['bitfit_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])

View File

@ -0,0 +1,52 @@
{
"backbone_model": "beit",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk",
"delta_type": "lora",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "cifar10",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/beit-large-patch16-224",
"model_path_public": "beit-large-patch16-224",
"num_classes": 10,
"num_train_epochs": 20,
"output_dir": "outputs/lora/beit-large-patch16-224/cifar10",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": false,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "cifar10",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "cifar10",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/beit-large-patch16-224",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["query","value"]
}

View File

@ -0,0 +1,52 @@
{
"backbone_model": "gpt-j",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "lora",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "wikitext",
"eval_steps": 500,
"evaluation_strategy": "steps",
"gradient_accumulation_steps":4,
"greater_is_better": false,
"learning_rate": 0.00003,
"load_best_model_at_end": true,
"max_source_length": 512,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/gpt-j-6B",
"model_path_public": "gpt-j-6B",
"num_train_epochs": 2,
"output_dir": "outputs/lora/gpt-j-6B/wikitext",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 2,
"per_device_train_batch_size": 2,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 500,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "wikitext",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "wikitext",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/gpt-j-6B",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["20.attn.q_proj","21.attn.q_proj","22.attn.q_proj","23.attn.q_proj","24.attn.q_proj","25.attn.q_proj","26.attn.q_proj","27.attn.q_proj"]
}

View File

@ -0,0 +1,52 @@
{
"backbone_model": "roberta-large",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "lora",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-boolq",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"is_seq2seq": false,
"learning_rate": 0.0001,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/roberta-large",
"model_path_public": "roberta-large",
"num_train_epochs": 20,
"output_dir": "outputs/lora/roberta-large/superglue-boolq",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": false,
"push_to_hub": false,
"push_to_dc": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-boolq",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-boolq",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/roberta-large",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["query","value"]
}

View File

@ -0,0 +1,52 @@
{
"backbone_model": "xlm-roberta-large",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "lora",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-wic",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"is_seq2seq": false,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/xlm-roberta-large",
"model_path_public": "xlm-roberta-large",
"num_train_epochs": 20,
"output_dir": "outputs/lora/xlm-roberta-large/superglue-wic",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": false,
"push_to_dc": true,
"push_to_hub": false,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-wic",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-wic",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/xlm-roberta-large",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["query","value"]
}

View File

@ -0,0 +1,52 @@
{
"backbone_model": "gpt2",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "low_rank_adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "wikitext",
"eval_steps": 200,
"evaluation_strategy": "steps",
"gradient_accumulation_steps":1,
"greater_is_better": false,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 768,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/gpt2",
"model_path_public": "gpt2",
"num_train_epochs": 2,
"output_dir": "outputs/low_rank_adapter/gpt2/wikitext",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "wikitext",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "wikitext",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/gpt2",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["attn","mlp"]
}

View File

@ -0,0 +1,51 @@
{
"backbone_model": "bert-large-cased",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "prefix",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "rte",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"is_seq2seq": false,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/bert-large-cased",
"num_train_epochs": 20,
"output_dir": "outputs/prefix/bert-large-cased/rte",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": false,
"push_to_dc": true,
"push_to_hub": false,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "rte",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "rte",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/bert-large-cased",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"warmup_steps": 0,
"modified_modules":["attention"]
}

View File

@ -0,0 +1,51 @@
{
"backbone_model": "bart",
"dataset_config_name": [
"en"
],
"datasets_load_from_disk": true,
"datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
"delta_type": "soft_prompt",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-boolq",
"eval_steps": 500,
"evaluation_strategy": "steps",
"gradient_accumulation_steps":1,
"greater_is_better": true,
"learning_rate": 0.1,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/bart-large",
"model_path_public": "bart-large",
"num_train_epochs": 50,
"output_dir": "outputs/soft_prompt/bart-large/superglue-boolq",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_dc": true,
"push_to_hf": false,
"save_steps": 500,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"soft_token_num":100,
"split_validation_test": true,
"task_name": "superglue-boolq",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-boolq",
"tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/bart-large",
"token_init": true,
"unfrozen_modules": [
"deltas"
],
"warmup_steps": 0
}

View File

@ -93,4 +93,10 @@ class AbstractTask(abc.ABC):
# shuffles the data and samples it. # shuffles the data and samples it.
if n_obs is not None: if n_obs is not None:
dataset = self.subsample(dataset, n_obs) dataset = self.subsample(dataset, n_obs)
return dataset.map(self.preprocessor)
this_method = getattr(self.__class__, 'preprocessor')
base_method = getattr(AbstractTask, 'preprocessor')
if this_method is not base_method:
return dataset.map(self.preprocessor)
else:
return dataset

View File

@ -12,22 +12,16 @@ import logging
import numpy as np import numpy as np
import torch import torch
import re import re
from openprompt.prompts import ManualTemplate, ManualVerbalizer
from openprompt.plms.utils import TokenizerWrapper
from openprompt.data_utils import InputExample
from openprompt.prompts import GenerationVerbalizer
import itertools import itertools
import os
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
from transformers.models.auto.tokenization_auto import tokenizer_class_from_name from transformers.models.auto.tokenization_auto import tokenizer_class_from_name
from typing import List, Dict from typing import List, Dict
from collections import defaultdict from collections import defaultdict
from openprompt.utils import round_list
import warnings import warnings
@ -68,7 +62,8 @@ class COLA(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.cola")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.cola")[split]
else: else:
return datasets.load_dataset('glue', 'cola', return datasets.load_dataset('glue', 'cola',
@ -96,7 +91,8 @@ class SST2(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.sst2")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.sst2")[split]
else: else:
return datasets.load_dataset('glue', 'sst2', return datasets.load_dataset('glue', 'sst2',
@ -123,10 +119,9 @@ class MRPC(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mrpc")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mrpc")[split]
else: else:
return datasets.load_dataset('glue', 'mrpc', split=split, script_version="master") return datasets.load_dataset('glue', 'mrpc', split=split, script_version="master")
@ -152,7 +147,8 @@ class QQP(AbstractTask):
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qqp")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qqp")[split]
else: else:
return datasets.load_dataset('glue', 'qqp', return datasets.load_dataset('glue', 'qqp',
@ -208,7 +204,8 @@ class MNLI(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mnli")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mnli")[split]
else: else:
return datasets.load_dataset('glue', 'mnli', split=split, script_version="master") return datasets.load_dataset('glue', 'mnli', split=split, script_version="master")
@ -243,7 +240,8 @@ class QNLI(AbstractTask):
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qnli")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qnli")[split]
else: else:
return datasets.load_dataset('glue', 'qnli', split=split, script_version="master") return datasets.load_dataset('glue', 'qnli', split=split, script_version="master")
@ -279,7 +277,8 @@ class RTE(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.rte")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.rte")[split]
else: else:
return datasets.load_dataset('glue', 'rte', return datasets.load_dataset('glue', 'rte',
@ -306,7 +305,8 @@ class WNLI(AbstractTask):
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.wnli")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.wnli")[split]
else: else:
return datasets.load_dataset('glue', 'wnli', split=split, script_version="master") return datasets.load_dataset('glue', 'wnli', split=split, script_version="master")
@ -334,7 +334,8 @@ class SuperGLUEBoolQ(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.boolq")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.boolq")[split]
else: else:
return datasets.load_dataset('super_glue', 'boolq', split=split, script_version="master") return datasets.load_dataset('super_glue', 'boolq', split=split, script_version="master")
@ -347,8 +348,8 @@ class SuperGLUECB(AbstractTask):
split_to_data_split = {"train": "train", split_to_data_split = {"train": "train",
"validation": "validation", "validation": "validation",
"test": "validation"} "test": "validation"}
metric = [metrics.mean_multiclass_f1(num_classes=3), metrics.accuracy] metric = [metrics.accuracy]
metric_names = ["f1_multiclass", "accuracy"] metric_names = ["accuracy"]
verbalizers = { verbalizers = {
"0":{"0": "yes", "0":{"0": "yes",
@ -361,7 +362,8 @@ class SuperGLUECB(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.cb")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.cb")[split]
else: else:
return datasets.load_dataset('super_glue', 'cb', split=split, script_version="master") return datasets.load_dataset('super_glue', 'cb', split=split, script_version="master")
@ -387,7 +389,8 @@ class SuperGLUECOPA(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.copa")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.copa")[split]
else: else:
return datasets.load_dataset('super_glue', 'copa', split=split, script_version="master") return datasets.load_dataset('super_glue', 'copa', split=split, script_version="master")
@ -416,7 +419,8 @@ class SuperGLUEMultiRC(AbstractTask):
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.multirc")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.multirc")[split]
else: else:
return datasets.load_dataset('super_glue', 'multirc', split=split, script_version="master") return datasets.load_dataset('super_glue', 'multirc', split=split, script_version="master")
@ -459,7 +463,8 @@ class SuperGLUEWIC(AbstractTask):
} }
def load_dataset(self, split): def load_dataset(self, split):
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.wic")[split] return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.wic")[split]
else: else:
return datasets.load_dataset('super_glue', 'wic', split=split, script_version="master") return datasets.load_dataset('super_glue', 'wic', split=split, script_version="master")
@ -549,13 +554,76 @@ class Beans(AbstractTask):
def load_dataset(self, split): def load_dataset(self, split):
# from IPython import embed; embed(header="beans") # from IPython import embed; embed(header="beans")
if self.data_args.datasets_load_from_disk: offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.wic")[split] if offline == '1':
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/beans")[split]
else: else:
return datasets.load_dataset('beans', split=split, script_version="master") return datasets.load_dataset('beans', split=split, script_version="master")
class Wikitext(AbstractTask):
#wikitext-2-v1
name = "wikitext"
# labels_list = ['angular_leaf_spot', 'bean_rust', "healthy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.perplexity]
metric_names = ["perplexity"]
verbalizers = {
"0": {
}
}
templates_text = {
"0": """{"meta":"text"}"""
}
split_valid_to_make_test = True
def load_dataset(self, split):
# from IPython import embed; embed(header="beans")
if self.data_args.datasets_load_from_disk:
return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/wikitext")[split]
else:
return datasets.load_dataset('wikitext','wikitext-2-v1', split=split, script_version="master")
class Cifar10(AbstractTask):
name = "cifar10"
split_to_data_split = {"train": "train",
"validation": "test",
"test": "test"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
if self.data_args.datasets_load_from_disk:
d = datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/cifar10")[split].select(range(100))
print(d)
return d
else:
return datasets.load_dataset('cifar10', split=split, script_version="master")
# def preprocessor(self, example):
# example_ = {}
# example_["image"] = example["image"]
# example_["labels"] = example["label"]
# return example_
class Fashion_MNIST(AbstractTask):
name = "Fashion-MNIST"
split_to_data_split = {"train": "train",
"validation": "test",
"test": "test"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
if self.data_args.datasets_load_from_disk:
d = datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/fashion_mnist")[split]
print(d)
return d
else:
return datasets.load_dataset('fashion_mnist', split=split, script_version="master")
TASK_MAPPING = OrderedDict( TASK_MAPPING = OrderedDict(
[ [
@ -575,7 +643,10 @@ TASK_MAPPING = OrderedDict(
('superglue-multirc', SuperGLUEMultiRC), ('superglue-multirc', SuperGLUEMultiRC),
('superglue-wic', SuperGLUEWIC), ('superglue-wic', SuperGLUEWIC),
# ('superglue-record', SuperGLUERecord) # ('superglue-record', SuperGLUERecord)
('beans', Beans) ('beans', Beans),
('wikitext',Wikitext),
('cifar10',Cifar10),
('fashion_mnist',Fashion_MNIST)
] ]
) )

View File

@ -11,6 +11,14 @@ import sklearn.metrics
logger = getLogger(__name__) logger = getLogger(__name__)
def perplexity(outputs, targets,ignore_index=-100):
"""Computes the perplexity accuracy."""
ce = -np.log(outputs).mean()
# ce = F.cross_entropy(torch.Tensor(outputs).view(-1, outputs.shape[-1]), torch.Tensor(targets).view(-1).long(),ignore_index=ignore_index)
return {"perplexity":float(np.exp(ce))}
def accuracy(predictions, targets) -> dict: def accuracy(predictions, targets) -> dict:
"""Computes the average accuracy.""" """Computes the average accuracy."""
return {"accuracy": 100 * ((np.array(predictions) == np.array(targets)).mean())} return {"accuracy": 100 * ((np.array(predictions) == np.array(targets)).mean())}
@ -47,20 +55,20 @@ def spearman_corrcoef(predictions, targets) -> dict:
def spearman_corrcoef(predictions, targets) -> dict: # def spearman_corrcoef(predictions, targets) -> dict:
"""Computes Spearman correlation coefficient.""" # """Computes Spearman correlation coefficient."""
# TODO: we need to do postprocessors in a clean way for each dataset. # # TODO: we need to do postprocessors in a clean way for each dataset.
from examples_seq2seq.data_processors.postprocessors import string_to_float # from examples_seq2seq.data_processors.postprocessors import string_to_float
targets = [string_to_float(target) for target in targets] # targets = [string_to_float(target) for target in targets]
predictions= [string_to_float(prediction) for prediction in predictions] # predictions= [string_to_float(prediction) for prediction in predictions]
spearman_corrcoef = 100 * scipy.stats.spearmanr(targets, predictions)[0] # spearman_corrcoef = 100 * scipy.stats.spearmanr(targets, predictions)[0]
# Note that if all the predictions will be the same, spearman # # Note that if all the predictions will be the same, spearman
# correlation is nan, to gaurad against this, we check the output # # correlation is nan, to gaurad against this, we check the output
# and return 0 in this case. # # and return 0 in this case.
if math.isnan(spearman_corrcoef): # if math.isnan(spearman_corrcoef):
spearman_corrcoef = 0 # spearman_corrcoef = 0
return {"spearmanr": spearman_corrcoef} # return {"spearmanr": spearman_corrcoef}
def f1_score_with_invalid(predictions, targets) -> dict: def f1_score_with_invalid(predictions, targets) -> dict:
@ -102,8 +110,8 @@ def f1_score(predictions, targets) -> dict:
Returns: Returns:
F1 score, where any prediction != 0 or 1 is counted as wrong. F1 score, where any prediction != 0 or 1 is counted as wrong.
""" """
targets = targets.astype(np.int32) targets = np.array(targets).astype(np.int32)
predictions = predictions.astype(np.int32) predictions = np.array(predictions).astype(np.int32)
return {"f1": 100 * sklearn.metrics.f1_score(targets, predictions)} return {"f1": 100 * sklearn.metrics.f1_score(targets, predictions)}
# TODO: maybe gaurd against invalid values https://stackoverflow.com/questions/56865344/how-do-i-calculate-the-matthews-correlation-coefficient-in-tensorflow # TODO: maybe gaurd against invalid values https://stackoverflow.com/questions/56865344/how-do-i-calculate-the-matthews-correlation-coefficient-in-tensorflow

View File

@ -26,10 +26,12 @@ You can also adapt this script on your own tasks.
import os import os
import sys import sys
os.environ['MKL_THREADING_LAYER'] = 'GNU' os.environ['MKL_THREADING_LAYER'] = 'GNU'
os.environ['MKL_SERVICE_FORCE_INTEL'] = '1' os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'
os.environ["TOKENIZERS_PARALLELISM"] = "false" os.environ["TOKENIZERS_PARALLELISM"] = "false"
sys.path.append(os.path.join(os.getcwd(), "../")) sys.path.append(os.path.join(os.getcwd(), "../"))
# sys.path.append(os.path.join(os.getcwd(), "/mnt/sfs_turbo/zhangzhen/OpenDelta"))
sys.path.append(os.path.join(os.getcwd())) sys.path.append(os.path.join(os.getcwd()))
import functools import functools
@ -56,7 +58,7 @@ from transformers.trainer_utils import is_main_process, get_last_checkpoint
from data_processors import AutoTask #, #TaskDataCollatorForSeq2Seq, AutoPostProcessor, data_collator from data_processors import AutoTask #, #TaskDataCollatorForSeq2Seq, AutoPostProcessor, data_collator
from utils import read_json, save_json from utils import read_json, save_json
from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, RemainArgHfArgumentParser from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, DeltaArguments, RemainArgHfArgumentParser
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@ -66,16 +68,14 @@ def main():
# See all possible arguments in src/transformers/training_args.py # See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script. # or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns. # We now keep distinct sets of args, for a cleaner separation of concerns.
parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments)) parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, DeltaArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args, delta_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args, delta_args = parser.parse_args_into_dataclasses(return_remaining_strings=True)
# You can provide a json file with contains the arguments and use the --argument some_arg to override or append to the json file.
json_file, cmd_args = (os.path.abspath(sys.argv[1]), sys.argv[2:]) if sys.argv[1].endswith(".json") else (None, sys.argv[1:])
model_args, data_args, training_args, delta_args, remain_args = parser.parse_json_file_with_cmd_args(json_file=json_file, command_line_args=cmd_args)
logger.warning("The following arguments not used! {}".format(remain_args))
print(f"{training_args.output_dir}/results.json") logger.info(f"The results will be used in {training_args.output_dir}/results.json")
# exit() # exit()
# Detecting last checkpoint. # Detecting last checkpoint.
last_checkpoint = None last_checkpoint = None
@ -121,7 +121,8 @@ def main():
if os.path.basename(model_args.model_name_or_path).startswith("t5"): if os.path.basename(model_args.model_name_or_path).startswith("t5") \
or os.path.basename(model_args.model_name_or_path).startswith("long-t5") :
from examples_prompt.backbones.t5 import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts from examples_prompt.backbones.t5 import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.t5 import Trainer, DataCollator from examples_prompt.backbones.t5 import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("blenderbot"): elif os.path.basename(model_args.model_name_or_path).startswith("blenderbot"):
@ -129,7 +130,9 @@ def main():
from examples_prompt.backbones.blenderbot import Trainer, DataCollator from examples_prompt.backbones.blenderbot import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("roberta") \ elif os.path.basename(model_args.model_name_or_path).startswith("roberta") \
or os.path.basename(model_args.model_name_or_path).startswith("bert") \ or os.path.basename(model_args.model_name_or_path).startswith("bert") \
or os.path.basename(model_args.model_name_or_path).startswith("albert") : or os.path.basename(model_args.model_name_or_path).startswith("albert") \
or os.path.basename(model_args.model_name_or_path).startswith("xlm-roberta") \
or os.path.basename(model_args.model_name_or_path).startswith("deberta") :
from examples_prompt.backbones.bert import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts from examples_prompt.backbones.bert import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.bert import Trainer, DataCollator from examples_prompt.backbones.bert import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("beit"): elif os.path.basename(model_args.model_name_or_path).startswith("beit"):
@ -144,6 +147,10 @@ def main():
elif os.path.basename(model_args.model_name_or_path).startswith("clip"): elif os.path.basename(model_args.model_name_or_path).startswith("clip"):
from examples_prompt.backbones.clip import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts from examples_prompt.backbones.clip import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.clip import Trainer, DataCollator from examples_prompt.backbones.clip import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("opt") \
or os.path.basename(model_args.model_name_or_path).startswith("gpt"):
from examples_prompt.backbones.opt import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.opt import Trainer, DataCollator
@ -161,7 +168,8 @@ def main():
if delta_args.delta_type.lower() != "none": if delta_args.delta_type.lower() != "none":
from opendelta import AutoDeltaConfig,AutoDeltaModel from opendelta import AutoDeltaConfig,AutoDeltaModel
delta_config = AutoDeltaConfig.from_dict(vars(delta_args)) from dataclasses import asdict
delta_config = AutoDeltaConfig.from_dict(asdict(delta_args))
delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=model) delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=model)
delta_model.freeze_module(set_state_dict = True) delta_model.freeze_module(set_state_dict = True)
delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True) delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
@ -278,14 +286,9 @@ def main():
if torch.cuda.is_available() and training_args.compute_memory: if torch.cuda.is_available() and training_args.compute_memory:
peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000 peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000
print(
"Memory utilization",
peak_memory,
"GB"
)
performance_metrics.update({"peak_memory": peak_memory}) performance_metrics.update({"peak_memory": peak_memory})
if training_args.compute_memory or training_args.compute_time: if training_args.compute_memory or training_args.compute_time:
print("Efficiency Statistics {}".format(performance_metrics)) logger.info("Efficiency Statistics {}".format(performance_metrics))
trainer.save_metrics("performance", performance_metrics) trainer.save_metrics("performance", performance_metrics)
# Evaluation # Evaluation
@ -313,17 +316,30 @@ def main():
trainer.save_metrics(f"{data_args.task_name}_test", metrics) trainer.save_metrics(f"{data_args.task_name}_test", metrics)
all_results['test'][data_args.task_name] = metrics all_results['test'][data_args.task_name] = metrics
# from opendelta.utils.delta_hub import create_hub_repo_name
# from opendelta.utils.delta_center import create_delta_center_args, create_repo_name
# repo_name = create_hub_repo_name(root="DeltaHub", # repo_name = create_hub_repo_name(root="DeltaHub",
# dataset=data_args.task_name, # dataset=data_args.task_name,
# delta_type = delta_args.delta_type, # delta_type = delta_args.delta_type,
# model_name_or_path= model_args.model_name_or_path) # model_name_or_path= model_args.model_name_or_path)
# results['repo_name'] = repo_name
# if delta_args.delta_type.lower() != "none": # center_args =
# if training_args.push_to_hub: # TODO add description here # repo_name = create_repo_name(prefix="", center_args=center_args)
# delta_model.save_finetuned(push_to_hub=True, save_directory=repo_name, use_auth_token=True) # all_results['repo_name'] = repo_name
# # trainer.push_to_hub(**kwargs)
# else:
# delta_model.save_finetuned(push_to_hub=False, save_directory=repo_name, use_auth_token=True) delta_model.save_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path,
push_to_dc=training_args.push_to_dc,
center_args={"test_performance":all_results['test'][data_args.task_name]['test_average_metrics'],
},
center_args_pool = {**vars(model_args), **vars(data_args), **vars(training_args), **vars(delta_args)},
list_tags = ['NLI'],
dict_tags = {'purpose':'for testing'},
delay_push=True,
test_result=all_results['test']
)
with open(f"{training_args.output_dir}/results.json", 'w') as fout: with open(f"{training_args.output_dir}/results.json", 'w') as fout:

View File

@ -0,0 +1,344 @@
# coding=utf-8
# Copyright OpenDelta Team and THUNLP lab. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
A unified runing scripts for most models to do down stream tasks in a
prompt learning fashion, i.e., No classification head, all tasks are casted
to mask prediction or span prediction tasks.
Processing relevant to different backbone models are stored in ../backbones/
Adding A few lines to integrate the Delta tuning methods.
You can also adapt this script on your own tasks.
"""
import os
import sys
os.environ['MKL_THREADING_LAYER'] = 'GNU'
os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'
os.environ["TOKENIZERS_PARALLELISM"] = "false"
sys.path.append(os.path.join(os.getcwd(), "../"))
sys.path.append(os.path.join(os.getcwd()))
import functools
import logging
import torch
import json
import numpy as np
import transformers
from transformers import (
AutoConfig,
AutoModelForMaskedLM,
AutoModelForSeq2SeqLM,
AutoTokenizer,
DataCollatorForSeq2Seq,
# HfArgumentParser,
# MBartTokenizer,
# default_data_collator,
Trainer,
Seq2SeqTrainer,
set_seed,
)
from transformers.trainer_utils import is_main_process, get_last_checkpoint
from data_processors import AutoTask #, #TaskDataCollatorForSeq2Seq, AutoPostProcessor, data_collator
from utils import read_json, save_json
from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, RemainArgHfArgumentParser, DeltaArguments
logger = logging.getLogger(__name__)
def main():
parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, DeltaArguments))
# You can provide a json file with contains the arguments and use the --argument some_arg to override or append to the json file.
json_file, cmd_args = (os.path.abspath(sys.argv[1]), sys.argv[2:]) if sys.argv[1].endswith(".json") else (None, sys.argv[1:])
model_args, data_args, training_args, delta_args, remain_args = parser.parse_json_file_with_cmd_args(json_file=json_file, command_line_args=cmd_args)
logger.warning("The following arguments not used! {}".format(remain_args))
# # exit()
# # Detecting last checkpoint.
# last_checkpoint = None
# if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
# last_checkpoint = get_last_checkpoint(training_args.output_dir)
# print("#### last_checkpoint ", last_checkpoint)
# if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
# '''
# raise ValueError(
# f"Output directory ({training_args.output_dir}) already exists and is not empty. "
# "Use --overwrite_output_dir to overcome."
# )
# '''
# pass
# elif last_checkpoint is not None:
# logger.info(
# f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
# "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
# )
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Set the verbosity to info of the Transformers logger (on main process only):
if is_main_process(training_args.local_rank):
transformers.utils.logging.set_verbosity_info()
# logger.info("Training/evaluation parameters %s", training_args, model_args, data_args, delta_args)
logger.info("{}\n{}\n{}\n{}".format(training_args, model_args, data_args, delta_args))
# Set seed before initializing model.
set_seed(training_args.seed)
if os.path.basename(model_args.model_name_or_path).startswith("t5"):
from examples_prompt.backbones.t5 import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.t5 import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("blenderbot"):
from examples_prompt.backbones.blenderbot import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.blenderbot import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("roberta") \
or os.path.basename(model_args.model_name_or_path).startswith("bert") \
or os.path.basename(model_args.model_name_or_path).startswith("albert") :
from examples_prompt.backbones.bert import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.bert import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("beit"):
from examples_prompt.backbones.beit import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.beit import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("bart"):
from examples_prompt.backbones.bart import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.bart import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("bigbird"):
from examples_prompt.backbones.bigbird import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.bigbird import Trainer, DataCollator
elif os.path.basename(model_args.model_name_or_path).startswith("clip"):
from examples_prompt.backbones.clip import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
from examples_prompt.backbones.clip import Trainer, DataCollator
config, tokenizer, model = get_backbone(model_args=model_args)
# model parallelize
if hasattr(training_args, "model_parallel") and training_args.model_parallel:
logger.info('parallelize model!')
model.parallelize()
from opendelta import Visualization
Visualization(model).structure_graph()
if delta_args.delta_type.lower() != "none":
from opendelta.delta_models.adapter import AdapterConfig, AdapterModel
delta_config = AdapterConfig.from_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path)
delta_model = AdapterModel.from_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path,
delta_config=delta_config,
backbone_model=model,
force_download=delta_args.force_download,
cache_dir=delta_args.delta_cache_dir)
# delta_model.freeze_module(set_state_dict = True)
delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
performance_metrics = {}
non_empty_splits_names = []
# if training_args.do_train:
# non_empty_splits_names.append("train")
# if training_args.do_eval:
# non_empty_splits_names.append("eval")
if training_args.do_test:
non_empty_splits_names.append("test")
splits = {}
for split_name in ['test']:
if split_name not in non_empty_splits_names:
splits[split_name] = None
continue
task = AutoTask.get(data_args.task_name,
data_args.dataset_config_name,
data_args=data_args,
seed=data_args.data_sample_seed)
dataset = task.get(split=split_name,
split_validation_test=training_args.split_validation_test,
n_obs=data_args.max_train_samples)
template, _verbalizer, tokenizer_wrapper = get_prompts(task, tokenizer, data_args)
dataset = dataset.map(
functools.partial(preprocess_function,
data_args=data_args,
tokenizer=tokenizer,
template=template,
verbalizer=_verbalizer,
tokenizer_wrapper=tokenizer_wrapper,
split=split_name),
batched=False,
num_proc=data_args.preprocessing_num_workers,
remove_columns=get_remove_columns(list(dataset.features.keys())),
load_from_cache_file=not data_args.overwrite_cache,
)
# from IPython import embed; embed()
splits[split_name] = dataset
if split_name == "test":
eval_task = task
verbalizer = _verbalizer
trainer = Trainer(
model=model,
verbalizer=verbalizer,
eval_task=eval_task,
args=training_args,
# train_dataset=splits['train'],
# eval_dataset=splits['eval'],
tokenizer=tokenizer,
data_collator=DataCollator(tokenizer),
)
def save_training_config(config_file, output_dir):
json_data = read_json(config_file)
save_json(os.path.join(output_dir, "training_config.json"), json_data)
# Saves training config.
if trainer.is_world_process_zero():
save_training_config(sys.argv[1], training_args.output_dir)
# # Training
# if training_args.do_train:
# checkpoint = None
# if training_args.resume_from_checkpoint is not None:
# checkpoint = training_args.resume_from_checkpoint
# elif last_checkpoint is not None:
# checkpoint = last_checkpoint
# if training_args.compute_time:
# torch.cuda.synchronize() # wait for move to complete
# start = torch.cuda.Event(enable_timing=True)
# end = torch.cuda.Event(enable_timing=True)
# start.record()
# train_result = trainer.train(resume_from_checkpoint=checkpoint)
# if training_args.compute_time:
# end.record()
# torch.cuda.synchronize() # wait for all_reduce to complete
# total_time = start.elapsed_time(end)/(1000*60)
# performance_metrics.update({"total_time in minutes ": total_time})
# trainer.save_model() # Saves the tokenizer too for easy upload
# train_metrics = train_result.metrics
# max_train_samples = (
# data_args.max_train_samples if data_args.max_train_samples is not None else len(splits['train'])
# )
# train_metrics["train_samples"] = min(max_train_samples, len(splits['train']))
# trainer.log_metrics("train", train_metrics)
# trainer.save_metrics("train", train_metrics)
# trainer.save_state()
# if torch.cuda.is_available() and training_args.compute_memory:
# peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000
# print(
# "Memory utilization",
# peak_memory,
# "GB"
# )
# performance_metrics.update({"peak_memory": peak_memory})
# if training_args.compute_memory or training_args.compute_time:
# print("Efficiency Statistics {}".format(performance_metrics))
# trainer.save_metrics("performance", performance_metrics)
# Evaluation
all_results = {}
# all_results['evaluate'] = {}
# if training_args.do_eval:
# logger.info("*** Evaluate ***")
# metrics = trainer.evaluate(eval_dataset=splits['eval'],
# )
# trainer.log_metrics(f"{data_args.task_name}_eval", metrics)
# trainer.save_metrics(f"{data_args.task_name}_eval", metrics)
# all_results['evaluate'][data_args.task_name] = metrics
# Test
all_results['test'] = {}
if training_args.do_test:
logger.info("*** Test ***")
metrics = trainer.evaluate(eval_dataset=splits['test'],
metric_key_prefix="test"
)
trainer.log_metrics(f"{data_args.task_name}_test", metrics)
trainer.save_metrics(f"{data_args.task_name}_test", metrics)
all_results['test'][data_args.task_name] = metrics
# from opendelta.utils.delta_hub import create_hub_repo_name
# from opendelta.utils.delta_center import create_delta_center_args, create_repo_name
# repo_name = create_hub_repo_name(root="DeltaHub",
# dataset=data_args.task_name,
# delta_type = delta_args.delta_type,
# model_name_or_path= model_args.model_name_or_path)
# center_args =
# repo_name = create_repo_name(prefix="", center_args=center_args)
# all_results['repo_name'] = repo_name
# delta_model.save_finetuned(push_to_hf=training_args.push_to_hf,
# push_to_dc=training_args.push_to_dc,
# center_args={},
# center_args_pool = {**vars(model_args), **vars(data_args), **vars(training_args), **vars(delta_args)},
# delay_push=True,
# )
print(all_results)
# with open(f"{training_args.output_dir}/results.json", 'w') as fout:
# string = json.dumps(all_results, indent=4,sort_keys=True)
# fout.write(string+"\n")
return all_results
if __name__ == "__main__":
result = main()

View File

@ -1,6 +1,10 @@
from dataclasses import dataclass, field from dataclasses import dataclass, field
from typing import Optional, List from typing import Optional, List
from transformers import HfArgumentParser from transformers import HfArgumentParser
from pathlib import Path
import sys
@dataclass @dataclass
class ModelArguments: class ModelArguments:
@ -81,6 +85,10 @@ class TrainingArguments(HfTrainingArguments):
remove_unused_columns: Optional[bool] = field( remove_unused_columns: Optional[bool] = field(
default=False, metadata={"help": "Remove columns not required by the model when using an nlp.Dataset."} default=False, metadata={"help": "Remove columns not required by the model when using an nlp.Dataset."}
) )
push_to_hf: Optional[bool] = field(default=False, metadata={"help": "Push the model to huggingface model hub."})
push_to_dc: Optional[bool] = field(default=True, metadata={"help": "Push the model to delta center."})
@ -211,28 +219,254 @@ class DataTrainingArguments:
self.test_max_target_length = self.max_target_length self.test_max_target_length = self.max_target_length
import dataclasses
@dataclass
class DeltaArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
delta_type: str= field(default="", metadata={"help": "the type of delta"})
backbone_model: Optional[str] = field(
default="", metadata={"help": "the backbone model"}
)
model_path_public: Optional[str] = field(
default="", metadata={"help": "the path (url) of the publicly available backbone model"}
)
modified_modules: Optional[List[str]] = field(
default_factory=lambda: None, metadata={"help": "the modules inside the backbone to be modified"}
)
unfrozen_modules: Optional[List[str]] = field(
default_factory=lambda:["deltas"], metadata={"help": "the modules inside the backbone or in the delta modules that need to be unfrozen"}
)
finetuned_delta_path: Optional[str] = field(
default=None, metadata={"help": "the path of the finetuned delta model"}
)
force_download: Optional[bool] = field(
default=False, metadata={"help": "whether to download the checkpoint form delta center no matter whether it exists"}
)
local_files_only: Optional[bool] = field(
default=False, metadata={"help": "whether not to look for file in delta center"}
)
delta_cache_dir: Optional[str] = field(
default=None, metadata={"help": "The cache path defined by user. If not set, we will firstly look into the"+
" working directory and then into the default cache path (ususally ~/.cache/delta_center)."}
)
delay_push: Optional[bool] = field(
default=True, metadata={
'help':'whether push the checkpoint to delta center later.'
}
)
def merge_arguments(self, objb):
print(objb)
self.__class__ = dataclasses.make_dataclass('DeltaArgument', fields=[(s.name, s.type, getattr(objb, s.name)) for s in dataclasses.fields(objb)], bases=(DeltaArguments,))
@dataclass
class AdapterArguments:
bottleneck_dim: Optional[int] = field(
default=24, metadata={"help": "the dimension of the bottleneck layer"}
)
@dataclass
class LoRAArguments:
lora_r: Optional[int] = field(
default=8, metadata={"help": "the rank of the LoRA metrics."}
)
@dataclass
class PrefixArguments:
pass
@dataclass
class BitFitArguments:
pass
@dataclass
class SoftPromptArguments:
soft_token_num: Optional[int] = field(
default=100, metadata={"help": "the num of soft tokens."}
)
@dataclass
class CompacterArguments:
pass
@dataclass
class LowRankAdapterArguments:
pass
# from opendelta.delta_models.adapter import AdapterConfig
# from opendelta.delta_models.bitfit import BitFitConfig
# from opendelta.delta_models.compacter import CompacterConfig
# from opendelta.delta_models.lora import LoraArguments
# from opendelta.delta_models.low_rank_adapter import LowRankAdapterConfig
# from opendelta.delta_models.prefix import PrefixConfig
# from opendelta.delta_models.soft_prompt import SoftPromptConfig
# DELTAARGMAP = {
# "adapter": AdapterConfig,
# "lora":LoraArguments,
# "prefix":PrefixConfig,
# "bitfit":BitFitConfig,
# "soft_prompt":SoftPromptConfig,
# "compacter":CompacterConfig,
# "low_rank_adapter":LowRankAdapterConfig
# }
DELTAARGMAP = {
"adapter": AdapterArguments,
"lora":LoRAArguments,
"prefix":PrefixArguments,
"bitfit":BitFitArguments,
"soft_prompt":SoftPromptArguments,
"compacter":CompacterArguments,
"low_rank_adapter":LowRankAdapterArguments
}
# TODO: add more specific delta arguments
class RemainArgHfArgumentParser(HfArgumentParser): class RemainArgHfArgumentParser(HfArgumentParser):
def parse_json_file(self, json_file: str, return_remaining_args=True ): '''This is a more powerful version of argument parser.
It can receiven both command line arguments and json file arguments.
The command line arguments will override the json file arguments.
The parser will load the specific delta arguments (e.g. Adapter's)
according to the delta_type argument. And merge the specific delta arguments
with the common delta arguments.
'''
def parse_json_file_with_cmd_args(self, json_file: str, command_line_args=None, return_remaining_args=True ):
""" """
Alternative helper method that does not use `argparse` at all, instead loading a json file and populating the Alternative helper method that does not use `argparse` at all, instead loading a json file and populating the
dataclass types. dataclass types.
""" """
import argparse
import json import json
from pathlib import Path from pathlib import Path
import dataclasses
data = json.loads(Path(json_file).read_text()) data = json.loads(Path(json_file).read_text())
data_str = ""
if command_line_args is None:
command_line_args = []
for key in data:
if "--"+key not in command_line_args:
if isinstance(data[key], list):
data_str += "--"+key
for elem in data[key]:
data_str+=" "+ str(elem)
data_str += " "
else:
data_str+= "--" + key + " " + str(data[key]) + " "
data_list = data_str.split()
data_list += command_line_args
if return_remaining_args:
outputs, remain_args = self.parse_args_into_dataclasses(args=data_list, return_remaining_strings=return_remaining_args)
for d in outputs:
if isinstance(d, DeltaArguments): # merge the specific delta arguments
d.merge_arguments(outputs[-1])
return [*(outputs[:-1]), remain_args]
else:
outputs = self.parse_args_into_dataclasses(args=data_list, return_remaining_strings=return_remaining_args)
for d in outputs:
if isinstance(d, DeltaArguments):
d.merge_arguments(outputs[-1])
return [*(outputs[:-1]),]
def parse_args_into_dataclasses(
self, args=None, return_remaining_strings=False, look_for_args_file=True, args_filename=None
):
"""
Parse command-line args into instances of the specified dataclass types.
This relies on argparse's `ArgumentParser.parse_known_args`. See the doc at:
docs.python.org/3.7/library/argparse.html#argparse.ArgumentParser.parse_args
Args:
args:
List of strings to parse. The default is taken from sys.argv. (same as argparse.ArgumentParser)
return_remaining_strings:
If true, also return a list of remaining argument strings.
look_for_args_file:
If true, will look for a ".args" file with the same base name as the entry point script for this
process, and will append its potential content to the command line args.
args_filename:
If not None, will uses this file instead of the ".args" file specified in the previous argument.
Returns:
Tuple consisting of:
- the dataclass instances in the same order as they were passed to the initializer.abspath
- if applicable, an additional namespace for more (non-dataclass backed) arguments added to the parser
after initialization.
- The potential list of remaining argument strings. (same as argparse.ArgumentParser.parse_known_args)
"""
if args_filename or (look_for_args_file and len(sys.argv)):
if args_filename:
args_file = Path(args_filename)
else:
args_file = Path(sys.argv[0]).with_suffix(".args")
if args_file.exists():
fargs = args_file.read_text().split()
args = fargs + args if args is not None else fargs + sys.argv[1:]
# in case of duplicate arguments the first one has precedence
# so we append rather than prepend.
namespace, remaining_args = self.parse_known_args(args=args)
# conditionally add delta arguments
deltatype_args = DELTAARGMAP[namespace.delta_type]
self.dataclass_types.append(deltatype_args)
self._add_dataclass_arguments(deltatype_args)
# parse the arguments again, this time with the specific delta type's arguments
namespace, remaining_args = self.parse_known_args(args=args)
outputs = [] outputs = []
for dtype in self.dataclass_types: for dtype in self.dataclass_types:
keys = {f.name for f in dataclasses.fields(dtype) if f.init} keys = {f.name for f in dataclasses.fields(dtype) if f.init}
inputs = {k: data.pop(k) for k in list(data.keys()) if k in keys} inputs = {k: v for k, v in vars(namespace).items() if k in keys}
for k in keys:
delattr(namespace, k)
obj = dtype(**inputs) obj = dtype(**inputs)
outputs.append(obj) outputs.append(obj)
if len(namespace.__dict__) > 0:
remain_args = argparse.ArgumentParser() # additional namespace.
remain_args.__dict__.update(data) outputs.append(namespace)
if return_remaining_args: if return_remaining_strings:
return (*outputs, remain_args) return (outputs, remaining_args)
else: else:
return (*outputs,) if remaining_args:
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
return outputs
# namespace, remaining_args = self.parse_known_args(args=data_list)
# print("Here", command_line_args, data_list,namespace, remaining_args)
# data.update(remain_args)
# outputs = []
# for dtype in self.dataclass_types:
# keys = {f.name for f in dataclasses.fields(dtype) if f.init}
# inputs = {k: namespace.get(k) for k in list(data.keys()) if k in keys}
# obj = dtype(**inputs)
# outputs.append(obj)
# # remain_args = argparse.ArgumentParser()
# remain_args.__dict__.update(remain_args)
# if return_remaining_args:
# return (*outputs, remain_args)
# else:
# return (*outputs,)

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "cola",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/cola",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "cola",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "cola",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "mnli",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/mnli",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "mnli",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "mnli",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "mrpc",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/mrpc",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "mrpc",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "mrpc",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "qnli",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/qnli",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "qnli",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "qnli",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "qqp",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/qqp",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "qqp",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "qqp",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "rte",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/rte",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": false,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "rte",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "rte",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "sst2",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/sst2",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "sst2",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "sst2",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "stsb",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/stsb",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "stsb",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "stsb",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-boolq",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/superglue-boolq",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-boolq",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-boolq",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-cb",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/superglue-cb",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-cb",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-cb",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-copa",
"eval_steps": 50,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 40,
"output_dir": "outputs/adapter/roberta-base/superglue-copa",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 50,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-copa",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-copa",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-multirc",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/superglue-multirc",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-multirc",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-multirc",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-record",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 512,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 3,
"output_dir": "outputs/adapter/roberta-base/superglue-record",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-record",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-record",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-wic",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/superglue-wic",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-wic",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-wic",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -0,0 +1,46 @@
{
"bottleneck_dim": 24,
"dataset_config_name": [
"en"
],
"delta_type": "adapter",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-wsc.fixed",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "eval_accuracy",
"model_name_or_path": "roberta-base",
"num_train_epochs": 20,
"output_dir": "outputs/adapter/roberta-base/superglue-wsc.fixed",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"task_name": "superglue-wsc.fixed",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-wsc.fixed",
"tokenizer_name": "roberta-base",
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier"
],
"warmup_steps": 0
}

View File

@ -161,6 +161,20 @@ AllConfigs['adapter_roberta-base'].update({
"output_dir": "outputs/adapter/roberta-base/", "output_dir": "outputs/adapter/roberta-base/",
}) })
AllConfigs['parallel_adapter_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['parallel_adapter_roberta-base'].update({
"delta_type": "parallel_adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"bottleneck_dim":24,
"output_dir": "outputs/parallel_adapter/roberta-base/",
})
AllConfigs['lora_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base']) AllConfigs['lora_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['lora_roberta-base'].update({ AllConfigs['lora_roberta-base'].update({
"delta_type": "lora", "delta_type": "lora",

Some files were not shown because too many files have changed in this diff Show More