temporary add

2022-10-20 10:16:05 +00:00 · 2022-10-20 10:16:05 +00:00 · e0de6b02ad
parent 2d3fc201d2 dd80f0d1b6
commit e0de6b02ad
180 changed files with 5407 additions and 1425 deletions
--- a/.gitignore
+++ b/.gitignore
@ -35,4 +35,27 @@ log.txt
 **/examples/examples_bmtrain/BMPretrain
 **/examples/examples_bmtrain/BigModels/BigModels/results
 **/Delta_Memory/
 **/output/
 **/thunlp/
 **/saved_ckpts/
 DeltaCenter-Python-Client/
 backbone_structure
 delta_checkpoints
 gitop.sh
 load_dataset_and_model.ipynb
 load_model.py
 scripts
 t.py
 t.sh
 !examples/examples_prompt/configs/*/*.json
 !examples/examples_prompt/configs/**
 **/delta_checkpoints/
 **/outputs/
 **/unittest/**
 !unittest/**.py
 !unittest/**.sh
--- a/README.md
+++ b/README.md
@ -26,16 +26,18 @@
 OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *delta tuning*), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.
- Our repo is tested on Python 3.8 and PyTorch 1.9.0. Lower version may also be supported. 
+- The latest version of OpenDelta is tested on Python==3.8.13, PyTorch==1.12.1, transformers==4.22.2. Other versions are likely to be supported as well. If you encounter bugs when using your own package versions, please raise an issue, we will look into it as soon as possible. 
 - **A demo of using Opendelta to modify the PLM (E.g., BART).**
 ![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif)
-## Updates
+## News
- 2022.03.24 We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance. 
+- **2022.10.14** Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-3-0)
- 2022.03.20 Add a [colab example](https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing) to illustrate efficient training and space-saving multitask-serving.
+- **2022.10.10** Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for [Delta Center](https://www.openbmb.org/toolKits/deltacenter), whose webpage is still under construction. Details can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-2-4)
- 2022.03.20 A new pip version released.
+- **2022.03.24** We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance. 
- 2022.02.16 Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing. 
+- **2022.03.20** Add a [colab example](https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing) to illustrate efficient training and space-saving multitask-serving.
 - **2022.03.20** A new pip version released.
 - **2022.02.16** Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing. 
 ## Installation
 create a virtualenv (optional)
@ -72,20 +74,95 @@ python setup.py install
 python setup.py develop
 ```
-## Must Try
+#### Tips
 - If you want to use mirror for installing the packages, please change the `index_url` in [setup.cfg](setup.cfg)
-```python
+- If you encounter network error using setup.py, please firstly install the dependencies via
-from transformers import AutoModelForSeq2SeqLM
+```shell
-t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
+pip install -r requirements.txt && python setup.py develop
 from opendelta import AutoDeltaModel
 delta = AutoDeltaModel.from_finetuned("thunlp/FactQA_T5-large_Adapter", backbone_model=t5)
 delta.log()
 ```
-## Verified Supported Models
+## Must Try
 The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py)
 ```python
 # use tranformers as usual.
 from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
 t5_tokenizer = AutoTokenizer.from_pretrained("t5-large")
 # A running example
 inputs_ids = t5_tokenizer.encode("Is Harry Poter wrtten by JKrowling", return_tensors="pt")
 t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
 # >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
 # use existing delta models
 from opendelta import AutoDeltaModel, AutoDeltaConfig
 # use existing delta models from DeltaCenter
 delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5)
 # freeze the whole backbone model except the delta models.
 delta.freeze_module()
 # visualize the change
 delta.log()
 t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
 # >>> <pad> Is Harry Potter written by JK Rowling?</s>
 # Now save merely the delta models, not the whole backbone model, to tmp/
 delta.save_finetuned(".tmp")
 import os; os.listdir(".tmp")
 # >>>  The state dict size is 1.443 MB
 # >>>  We encourage users to push their final and public models to delta center to share them with the community!
 # reload the model from local url and add it to pre-trained T5.
 t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large")
 delta1 = AutoDeltaModel.from_finetuned(".tmp", backbone_model=t5)
 import shutil; shutil.rmtree(".tmp") # don't forget to remove the tmp files. 
 t5_tokenizer.decode(t5.generate(inputs_ids)[0]) 
 # >>> <pad> Is Harry Potter written by JK Rowling?</s>
 # detach the delta models, the model returns to the unmodified status.
 delta1.detach()
 t5_tokenizer.decode(t5.generate(inputs_ids)[0])  
 # >>> '<pad><extra_id_0>? Is it Harry Potter?</s>'
 # use default configuration for cunstomized wrapped models which have PLMs inside. This is a common need for users. 
 import torch.nn as nn
 class WrappedModel(nn.Module):
  def __init__(self, inner_model):
    super().__init__()
    self.inner = inner_model
  def forward(self, *args, **kwargs):
    return self.inner(*args, **kwargs)
 wrapped_model = WrappedModel(WrappedModel(t5))
 # say we use LoRA
 delta_config = AutoDeltaConfig.from_dict({"delta_type":"lora"})
 delta2 = AutoDeltaModel.from_config(delta_config, backbone_model=wrapped_model)
 delta2.log()
 # >>> root
 #       -- inner
 #          -- inner
 #             ...
 #             ... lora_A:[8,1024], lora_B:[1024,8]
 delta2.detach()
 # use a not default configuration
 # say we add lora to the last four layer of the decoder of t5, with lora rank=5
 delta_config3 = AutoDeltaConfig.from_dict({"delta_type":"lora", "modified_modules":["[r]decoder.*((20)|(21)|(22)|(23)).*DenseReluDense\.wi"], "lora_r":5})
 delta3 = AutoDeltaModel.from_config(delta_config3, backbone_model=wrapped_model)
 delta3.log()
 ```
 ## Verified Default Configurations  
 - **You can try to use OpenDelta on *any* backbone models based on PyTorch.**  
- However, with small chances thatThe interface of the submodules of the backbone model is not supported. Therefore we verified some commonly
+- However, with small chances that the interface of the submodules of the backbone model is not supported. Therefore we verified some commonly
 used models that OpenDelta are sure to support.
 - We will keep testing more and more emerging models.
@ -107,3 +184,5 @@ used models that OpenDelta are sure to support.
--- a/dist/opendelta-0.2.0-py3-none-any.whl
+++ b/dist/opendelta-0.2.0-py3-none-any.whl
--- a/dist/opendelta-0.2.0.tar.gz
+++ b/dist/opendelta-0.2.0.tar.gz
--- a/dist/opendelta-0.2.1-py3-none-any.whl
+++ b/dist/opendelta-0.2.1-py3-none-any.whl
--- a/dist/opendelta-0.2.1.tar.gz
+++ b/dist/opendelta-0.2.1.tar.gz
--- a/dist/opendelta-0.2.2-py3-none-any.whl
+++ b/dist/opendelta-0.2.2-py3-none-any.whl
--- a/dist/opendelta-0.2.2.tar.gz
+++ b/dist/opendelta-0.2.2.tar.gz
--- a/dist/opendelta-0.2.3-py3-none-any.whl
+++ b/dist/opendelta-0.2.3-py3-none-any.whl
--- a/dist/opendelta-0.2.3.tar.gz
+++ b/dist/opendelta-0.2.3.tar.gz
--- a/dist/opendelta-0.2.4-py3-none-any.whl
+++ b/dist/opendelta-0.2.4-py3-none-any.whl
--- a/dist/opendelta-0.2.4.tar.gz
+++ b/dist/opendelta-0.2.4.tar.gz
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -1,13 +1,17 @@
 sphinx_copybutton
 sphinx_rtd_theme
 sphinx_toolbox
-torch
+myst_parser
-transformers
+
-sentencepiece==0.1.96
+torch>=1.8.0
-tqdm==4.62.2
+transformers>=4.10.0
-openprompt
+datasets==1.17.0
-loralib
+sentencepiece>=0.1.96
 tqdm>=4.62.2
 decorator
 rich
-myst_parser
+web.py
-web.py
+gitpython
 scipy # need?
 sklearn # need?
 delta_center_client==0.0.4
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -19,7 +19,9 @@ import datetime
 import sphinx_rtd_theme
 import doctest
 import opendelta
-import opendelta.delta_models
+
 # -- Project information -----------------------------------------------------
@ -29,8 +31,8 @@ copyright = '{}, {}, Licenced under the Apache License, Version 2.0'.format(date
 # The full version, including alpha/beta/rc tags
-release = '0.1.1'
+release = '0.3.1'
-version = "0.1.1"
+version = "0.3.1"
 html_theme = 'sphinx_rtd_theme'
 html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -1,7 +1,7 @@
 OpenDelta's documentation!
 =====================================
-OpenDelta is a **Plug-and-play** Library of the parameter-efficient fine-tuning ([delta-tuning](WhatisDelta)) technology for pre-trained models.
+[OpenDelta](https://github.com/thunlp/OpenDelta/) is a **Plug-and-play** Library of the parameter-efficient fine-tuning ([delta-tuning](WhatisDelta)) technology for pre-trained models.
 ## Essential Advantages:
@ -35,11 +35,18 @@ OpenDelta is a **Plug-and-play** Library of the parameter-efficient fine-tuning
   notes/pluginunplug.md
   notes/acceleration.md
   notes/explored_config.md
 .. toctree::
   :maxdepth: 1
   :caption: Information
   notes/citation.md
   notes/update.md
   notes/faq.md
 .. toctree::
   :maxdepth: 2
-   :caption: Package Reference
+   :caption: Documentation
   modules/base
   modules/deltas
--- a/docs/source/notes/citation.md
+++ b/docs/source/notes/citation.md
@ -1,3 +1,12 @@
 # Citation
-<img src="../imgs/todo-icon.jpeg" height="30px"> We are working on a technical report.
+If you find our repo useful, please cite the following paper. 
 ```
@article{ding2022delta,
  title={Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models},
  author={Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and others},
  journal={arXiv preprint arXiv:2203.06904},
  year={2022}
 }
 ```
--- a/docs/source/notes/composition.md
+++ b/docs/source/notes/composition.md
@ -1,10 +1,9 @@
 (composition)=
 # Composition of delta models
 With OpenDelta, you can perform compostion of different delta models.
-### Add different deltas to the backbone
+## Add different deltas to the backbone
 ```
 from transformers import AutoModelForSequenceClassification
@ -18,14 +17,14 @@ delta_model.log()
 ```{figure} ../imgs/composition_of_delta.png
 ---
 width: 600px
-name: defaultmodification
+name: composition_of_delta
 ---
 ```
 ````
-### Even add multiple delta to the same layer
+## Even add multiple delta to the same layer
 ```
 from transformers import AutoModelForSequenceClassification
@ -40,7 +39,7 @@ delta_model.log()
 ```{figure} ../imgs/multiple_to_one_layer.png
 ---
 width: 600px
-name: defaultmodification
+name: multiple_to_one_layer
 ---
 ```
 ````
--- a/docs/source/notes/explored_config.md
+++ b/docs/source/notes/explored_config.md
@ -1,11 +1,7 @@
 (favoredconfiguration)=
 # Favored Configuration
-<img src="../imgs/todo-icon.jpeg" height="30px"> We will add the commonly used configuration of delta models HERE in future.
+Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers.
-E.g.
+ - [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918)
- the modified_modules (position of delta), 
+ - [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382)
 - hyperparameter that are the most efficient
 - the favored composition between delta models
 Currenlty, use the default setting, explore it by yourself, or refer to existing papers' configuration!
--- a/docs/source/notes/faq.md
+++ b/docs/source/notes/faq.md
@ -0,0 +1,14 @@
 # FAQs
 1. **Why I encounder NotImplementedError in Prefix Tuning?**
    This is because we find no easy way to get a unified Prefix Tuning implementation for different attention classes. If you really want to use Prefix Tuning for the models we have not supported, you can implement the ``PrefixLayerYOURMODEL`` on your own or raise a issue to request the feature for your model. 
 2. **Available Models with default configurations are ..., Please manually add the delta models by speicifying 'modified_modules' based on the visualization of your model structure**
    Although most pre-trained models (PTMs) use the transformers archtecture, they are implemented differently. For example, the attention module in GPT2 and BERT is not only named differently, but also implemented in different ways. Common structure mapping mapps the different name conventions of different PTMs into a unified name convention. But there are many PTMs that we do not currently cover. But don't worry! For these models, you can figure out which modules should you modify by simply [visualizing the PTMs](visualization), and then specify the `modified modules` manually (See [name-based addressing](namebasedaddr)). 
 3. **Requires a dummy_inputs to be passed through the model to understand the dimensionality of each tensor in the computation graph. The {module.__class__.__name__} Class has no dummy_inputs, and automatically created dummy_inputs failed.**
    The `dummy_inputs` can be any data that make `backbone_model.forward(**dummy_inputs)` succeed. Only the form and shape of the `dummy_inputs` matter. To set dummy_inputs for your model, please use: `setattr(backbone_model, 'dummy_inputs', some_dummy_inputs)` before initializing `{self.__class__.__name__}`.
--- a/docs/source/notes/keyfeature.md
+++ b/docs/source/notes/keyfeature.md
@ -38,7 +38,7 @@ We use three key functions to achieve the modifications to the backbone model ou
   - **parallel insertion**
    Adapters can also be used in a parallel fashion (see [Paper](https://arxiv.org/abs/2110.04366)).
-    For these methods, use [insert_parallel_module](opendelta.basemodel.DeltaBase.insert_parrellel_module) interface.
+    For these methods, use [insert_parallel_module](opendelta.basemodel.DeltaBase.insert_parallel_module) interface.
 :::{admonition} Doc-preserving Insertion
--- a/docs/source/notes/namebasedaddr.md
+++ b/docs/source/notes/namebasedaddr.md
@ -1,4 +1,4 @@
-(namebasedaddr)=
+
 # Name-based Addressing
 Named based addressing is what set OpenDelta apart from other packages and provide the possibility to be used to a broader range of models (even emerging ones).
@ -52,7 +52,7 @@ In this case, string `"name_b.0.name_a"` will be the name to address the submodu
 Thus when applying a delta model to this toy net.
-```
+```python
 from opendelta import AdapterModel
 AdapterModel(backbone_model=root, modified_modules=['name_b.0.name_a'])
 Visualization(root).structure_graph()
@ -67,7 +67,7 @@ name: toy-delta
 ```
 ````
-
+(targetmodules)=
 ## Target modules.
 For different delta methods, the operation for the modification target is different.
@ -88,7 +88,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
 1. **End-matching** Rules.
    OpenDelta will take every modules that 
-    **ends with** the provided name suffix as the modification [target module](target_module). 
+    **ends with** the provided name suffix as the modification [target module](targetmodules). 
    :::{admonition} Example
    :class: tip
    Taking DistilBert with an classifier on top as an example:
@ -115,7 +115,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
    :::{admonition} Regex in Json Configs 
    :class: warning
    In json, you should write `"\\."` instead of `"\."` for a real dot due to json parsing rules. That is 
-    ```json
+    ```
    {   
        ...
        "modified_moduls": ['[r][0-5]\\.attention'],
@ -138,7 +138,7 @@ Handcrafting the full names of submodules can be frustrating. We made some simpl
    delta_model = LoraModel(backbone_model=model, interactive_modify=True)
    ```
-    by setting `interactive_modify`, a web server will be opened on local host, and the link will be print in the terminal.
+    by setting `interactive_modify`, a web server will be opened on local host, and the link will be print in the terminal, e.g.,
    ```
    http://0.0.0.0:8888/
--- a/docs/source/notes/pluginunplug.md
+++ b/docs/source/notes/pluginunplug.md
@ -19,7 +19,7 @@ delta_model.log()
 ```{figure} ../imgs/plugunplug1.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug1
 ---
 ```
 ````
@ -33,7 +33,7 @@ delta_model.log()
 ```{figure} ../imgs/plugunplug2.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug2
 ---
 ```
 ````
@ -48,7 +48,7 @@ delta_model.log()
 ```{figure} ../imgs/plugunplug3.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug3
 ---
 ```
 ````
@ -67,7 +67,7 @@ delta_model2.log()
 ```{figure} ../imgs/plugunplug4.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug4
 ---
 ```
 ````
@ -81,7 +81,7 @@ delta_model.log()
 ```{figure} ../imgs/plugunplug5.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug5
 ---
 ```
 ````
@ -96,7 +96,7 @@ delta_model.log()
 ```{figure} ../imgs/plugunplug6.png
 ---
 width: 800px
-name: defaultmodification
+name: plugunplug6
 ---
 ```
 ````
--- a/docs/source/notes/saveload.md
+++ b/docs/source/notes/saveload.md
@ -1,4 +1,3 @@
 (saveload)=
 # Save and Share the Delta
 ## Space efficient saving without changing the code.
@ -95,4 +94,4 @@ If you are satisfied with your checkpoint, do not forget to share your model to
 ## Save & Load for Composition of Delta
-<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition of delta model](compositon). Please wait for future releases. 
+<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition](composition) of delta model. Please wait for future releases. 
--- a/docs/source/notes/unifyname.md
+++ b/docs/source/notes/unifyname.md
@ -1,4 +1,4 @@
-(unifyname)=
+(commonstructure)=
 # Common Structure Mapping
@ -41,7 +41,7 @@ Visualize bert-base using a common structure name: The submodules that are not c
 ```{figure} ../imgs/commonstructure_vis.png
 :width: 600px
-:name: transformers_structure
+:name: commonstructure_vis
 ```
 (mappingexample)=
--- a/docs/source/notes/update.md
+++ b/docs/source/notes/update.md
@ -0,0 +1,29 @@
 # Update Logs and Known Issues
 ## Version 0.3.1
 - We update [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py) for a simple introduction of the core functionality of OpenDelta.
 - Thanks to [Weilin Zhao](https://github.com/Achazwl) We merge a long-developed branch parallel_adapter into the main branch.
 ## Version 0.3.0
 ### Updates:
 - Add this changelog for a granular record of updates.
 - The default configuration of delta models can be applied to more wrapped models.
  - There is less need to configure 'modified_modules' for wrapped models like [BertForSequenceClassification](https://huggingface.co/docs/transformers/main/en/model_doc/bert#transformers.BertForSequenceClassification) or even [OpenMatch.DRModel](https://github.com/OpenMatch/OpenMatch/blob/master/src/openmatch/modeling/dense_retrieval_model.py#L37), as long as it has a model we support default configuration inside. **Note that if you customize `modified_modules` by yourself, most pytorch models are supported.**
 - LoRA and BitFit models now does not need pseudo data to instantiate the model.
 - BitFit models can now support [Conv1D](https://huggingface.co/docs/transformers/v4.23.1/en/internal/modeling_utils#transformers.Conv1D) using default configuration.
 - Improve type hint for AutoDeltaModel.
 - Fix bugs in documentation.
 - Fix small bugs when saving a model without a config attributes.
 - Make the default modified modules of adapter-like methods more accurate: attach the adapter-like modules after the output of attention layer and second feed-forward layer, both before the layernorm layers. 
 - A simple unit test folder containing development-time tests has been added for interested users.
 ### Known Issues
 - SoftPrompt is still not supported for wrapped model if the model has no attribute `get_input_embeddings`.
 - Prefix Tuning is still limited to T5, GPT2, Bart, Bert, Roberta.
 ## Version 0.2.4
 ### Updates
 - examples/examples_seq2seq and examples/examples_text-classification is depreciated and moved to [legacy](https://github.com/thunlp/OpenDelta/tree/main/examples/legacies)
 - Thanks to [Zhen Zhang](https://github.com/namezhenzhang),  we provide [examples_prompt](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt), as a cleaner and more general framework, which unifies the delta tuning paradigm and the prompt-tuning paradigm. It is still based on [Huggingface Trainers](https://huggingface.co/docs/transformers/main_classes/trainer). In this example framework, the running pipeline is [a unified script](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/src), the differences in tasks, models, delta tuning models, and even prompt-tuning paradigms are [more modular and be more independent ](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/backbones). Please try it out!
--- a/docs/source/notes/usage.md
+++ b/docs/source/notes/usage.md
@ -12,7 +12,7 @@ model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
 ## STEP 2: Add delta modules
 We provide two alternatives to add the delta modules.
 ### 2.1 Modification based on visualization
-Suppose we want to make the feedforward layer of each block as our [modification target module](target_module),
+Suppose we want to make the feedforward layer of each block as our [modification target module](targetmodules),
 We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
 ```python
@ -48,7 +48,7 @@ delta_model.log() # This will visualize the backbone after modification and othe
 ### 2.2 Use the default modification.
 We also provide the default modifications of each delta methods for some commonly used PTMs (e.g., BERT, RoBERTA, DistilBERT, T5, GPT2), so the users don't need to specify the submodules to modify.
-The default modifications is achieved by mapping a name of a submodule to it's name on a common transformer structure. <img src="../imgs/hint-icon-2.jpg" height="30px">  *For details about the common structure mapping, see [Common Structure Mapping](unifyname)*
+The default modifications is achieved by mapping a name of a submodule to it's name on a common transformer structure. <img src="../imgs/hint-icon-2.jpg" height="30px">  *For details about the common structure mapping, see [Common Structure Mapping](commonstructure)*
--- a/docs/source/notes/visualization.md
+++ b/docs/source/notes/visualization.md
@ -1,4 +1,3 @@
 (visualization)=
 # Visualize the Parameters
 When OpenDelta makes modifications to a pretrained model (PTM), it is beneficial to know what your PTM looks like, especially the location of the parameters.
--- a/examples/examples_prompt/README.md
+++ b/examples/examples_prompt/README.md
@ -1,24 +1,59 @@
-# !!!!This example collection is still under develop, please wait for some time to use it.
+# Examples of using opendelta together with 🤗 transformers.
-## install the repo
+In this repo, we construct a very general pipeline to train and test a PLM using
 🤗 transformers.
 The pipeline was constructed together with [openpromptu](https://pypi.org/project/openpromptu/), which is a light and
 model-agnostic version of [openprompt](https://github.com/thunlp/OpenPrompt).
 ## Pool of PLMs
 We are going to adapt most of the models in 🤗 transformers
 in the repos. The different pipeline, processing, or configurations are specified
 in `./backbones/`. You can add your own model in this file to support customized models.
 ### A example script to run the repo in offline mode
 ```bash
-cd ../
+conda activate [YOURENV]
-python setup_seq2seq.py develop
+PATHBASE=[YOURPATH]
 JOBNAME="adapter_t5-base"
 DATASET="superglue-cb"
 cd $PATHBASE/OpenDelta/examples/examples_prompt/
 python configs/gen_t5.py --job $JOBNAME
 export TRANSFORMERS_OFFLINE=1
 export HF_DATASETS_OFFLINE=1
 python src/run.py configs/$JOBNAME/$DATASET.json \
 --model_name_or_path [YOURPATH_TO_T5_BASE] \
 --tokenizer_name [YOURPATH_TO_T5_BASE] \
 --datasets_saved_path [YOURPATH_TO_CB_DATASETS] \
 --finetuned_delta_path ${PATHBASE}/delta_checkpoints/ \
 --num_train_epochs 20 \
 --bottleneck_dim 24 \
 --delay_push True
 ```
 This will add `examples_seq2seq` to the environment path of the python lib.
-## Generating the json configuration file
+## A example of quick testing the repo.
-```shell
+```bash
-python configs/gen_$BACKBONETYPE.py --job $YOURJOB
+conda activate [YOURENV]
-#e.g. python configs/gen_beit.py --job lora_beit-base-patch16-224
+PATHBASE=[YOURPATH]
 ```
 The available job configuration (e.g., `--job lora_beit-base-patch16-224`) can be seen from the scripts. You can also
 create your only configuration.
 JOBNAME="adapter_t5-base"
 DATASET="superglue-cb"
-## Run the code
+cd $PATHBASE/OpenDelta/examples/examples_prompt/
-```
+export TRANSFORMERS_OFFLINE=1
-CUDA_VISIBLE_DEVICES=1 python src/run.py configs/lora_beit-base-patch16-224/beans.json
+export HF_DATASETS_OFFLINE=1
-```
+export DELTACENTER_OFFLINE=0
 python src/test.py configs/$JOBNAME/$DATASET.json \
 --model_name_or_path [YOURPATH_TO_T5_BASE] \
 --tokenizer_name [YOURPATH_TO_T5_BASE] \
 --datasets_saved_path [YOURPATH_TO_CB_DATASETS] \
 --finetuned_delta_path thunlp/t5-base_adapter_superglue-cb_20220701171436c80 \
 --delta_cache_dir "./delta_checkpoints/" \
 --force_download True
 ```
--- a/examples/examples_prompt/backbones/bart.py
+++ b/examples/examples_prompt/backbones/bart.py
@ -26,14 +26,14 @@ def preprocess_function(raw_example, **kwargs):
    example = InputExample(**raw_example)
-    try:
+
-        example = verbalizer.wrap_one_example(example)
+    example = verbalizer.wrap_one_example(example)
-        example, other = template.wrap_one_example(example)
+    example, other = template.wrap_one_example(example)
-        input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
+    input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
-        model_inputs = tokenizer(input_sentence, max_length=256,
+    model_inputs = tokenizer(input_sentence, max_length=256,
-                            padding="max_length", truncation=True)
+                        padding="max_length", truncation=True)
-    except:
+
-        from IPython import embed; embed(header="Therer")
+
    with tokenizer.as_target_tokenizer():
        label = tokenizer(other['tgt_text']).input_ids
@ -43,7 +43,8 @@ def preprocess_function(raw_example, **kwargs):
 def get_backbone(model_args, **kwargs):
    config = AutoConfig.from_pretrained(
-        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
+        # model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
--- a/examples/examples_prompt/backbones/beit.py
+++ b/examples/examples_prompt/backbones/beit.py
@ -8,7 +8,6 @@ from transformers import (
    AutoFeatureExtractor,
    AutoModelForImageClassification,
 )
 from transformers import ViTFeatureExtractor
 from transformers import Trainer as HfTrainer
 import torch.nn as nn
@ -26,9 +25,10 @@ def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
 def preprocess_function(raw_example, **kwargs):
    # from IPython import embed; embed(header="Therefa")
    tokenizer = kwargs['tokenizer']
-    model_inputs = tokenizer(raw_example['image'], return_tensors='pt')
+    # print(np.array(raw_example['img']).shape)
    model_inputs = tokenizer(np.array(raw_example['image']), return_tensors='pt')
    model_inputs['pixel_values'] = model_inputs['pixel_values'].squeeze()
-    model_inputs['labels'] = raw_example['labels']
+    model_inputs['labels'] = raw_example['label']
    return model_inputs
 def compute_metrics(eval_preds, dataset_name, eval_metric):
@ -55,7 +55,7 @@ def mask_token_func(tokenizer, ith_mask=0):
 def get_remove_columns(dataset_features):
    # dataset_features.pop("label")
-    print("remove_columns: {}".format(dataset_features))
+    # print("remove_columns: {}".format(dataset_features))
    return dataset_features
 class DataCollator(HfDataCollatorMixin):
--- a/examples/examples_prompt/backbones/bigbird_.py
+++ b/examples/examples_prompt/backbones/bigbird_.py
@ -0,0 +1,169 @@
 from openpromptu.data_utils import InputExample
 import torch
 from transformers.data.data_collator import torch_default_data_collator
 from transformers.data.data_collator import DataCollatorMixin as HfDataCollatorMixin
 from transformers.data.data_collator import DataCollatorForSeq2Seq as DataCollator
 import numpy as np
 from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
 )
 from transformers import Seq2SeqTrainer as HfSeq2SeqTrainer
 import copy
 from torch.nn import CrossEntropyLoss
 def preprocess_function(raw_example, **kwargs):
    tokenizer = kwargs['tokenizer']
    data_args = kwargs['data_args']
    template = kwargs['template']
    verbalizer = kwargs['verbalizer']
    tokenizer_wrapper = kwargs['tokenizer_wrapper']
    example = InputExample(**raw_example)
    # example = verbalizer.wrap_one_example(example)
    example, other = template.wrap_one_example(example)
    input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
    model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
                        padding="max_length", truncation=True)
    return model_inputs
 def compute_metrics(eval_preds, dataset_name, eval_metric):
    pass
 def mask_token_func(tokenizer, ith_mask=0):
    return tokenizer.pad_token
 def get_remove_columns(dataset_features):
    # dataset_features.remove("label")
    return dataset_features
 def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
    from openpromptu.prompts import GenerationVerbalizer
    from openpromptu.prompts import ManualTemplate
    from openpromptu import TokenizerWrapper
    template = ManualTemplate(text = task.templates_text[template_id])
    verbalizer = GenerationVerbalizer(tokenizer=tokenizer, classes = None, label_words=None)
    tokenizer_wrapper = TokenizerWrapper(max_seq_length=data_args.max_source_length, tokenizer=tokenizer, truncate_method="balanced", mask_token_func=mask_token_func)
    return template, verbalizer, tokenizer_wrapper
 def get_backbone(model_args, **kwargs):
    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    # config.dropout_rate = 0.0
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
        )
    return config, tokenizer, model
 class Trainer(HfSeq2SeqTrainer):
    def __init__(self, verbalizer=None, eval_task=None, **kwargs):
        super().__init__(**kwargs)
        self.eval_task = eval_task
        self.compute_metrics = self._compute_metrics
    def compute_loss(self, model, inputs, return_outputs=False):
        labels=copy.deepcopy(inputs['input_ids'])
        # labels[labels==self.tokenizer.pad_token_id]=-100
        outputs = model(**inputs)
        logits = outputs.logits
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()
        loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
        loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.long().view(-1))
        return (loss, outputs) if return_outputs else loss
    def prediction_step(
        self,
        model, #nn.Module,
        inputs, #Dict[str, Union[torch.Tensor, Any]],
        prediction_loss_only, #: bool,
        ignore_keys, #: Optional[List[str]] = None,
    ): #-> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
        """
        Perform an evaluation step on :obj:`model` using obj:`inputs`.
        Subclass and override to inject custom behavior.
        Args:
            model (:obj:`nn.Module`):
                The model to evaluate.
            inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
                The inputs and targets of the model.
                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
                argument :obj:`labels`. Check your model's documentation for all accepted arguments.
            prediction_loss_only (:obj:`bool`):
                Whether or not to return the loss only.
        Return:
            Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
            labels (each being optional).
        """
        if not self.args.predict_with_generate or prediction_loss_only:
            return super().prediction_step(
                model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
            )
        inputs = self._prepare_inputs(inputs)
        with torch.no_grad():
            labels=copy.deepcopy(inputs['input_ids'])
            # labels[labels==self.tokenizer.pad_token_id]=-100
            outputs = model(**inputs)
            logits = outputs.logits
            shift_logits = logits[..., :-1, :].contiguous()
            shift_labels = labels[..., 1:].contiguous().long()
            loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
            loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1)).detach().cpu()
            loss = torch.where(torch.isnan(loss), torch.full_like(loss, 0), loss)
        if prediction_loss_only:
            return (loss, None, None)
        else:
            # non pad label
            shift_labels = shift_labels.view(-1).detach().cpu()
            nonpad_idx = shift_labels!=self.tokenizer.pad_token_id
            shift_labels = shift_labels[nonpad_idx]
            # the probability at the corresponding position
            shift_logits = shift_logits.view(-1, shift_logits.shape[-1])[nonpad_idx].detach().cpu()
            target_position = torch.nn.functional.one_hot(shift_labels,shift_logits.shape[-1]).bool().to(shift_labels.device)
            shift_logits = shift_logits.softmax(dim=-1)[target_position]
            return (loss, shift_logits, shift_labels)
    def _compute_metrics(self, eval_preds):
        preds, labels = eval_preds
        result = {}
        for metric in self.eval_task.metric:
            result.update(metric(preds, labels,ignore_index=self.tokenizer.pad_token_id))
        average_metric = sum(result.values())/len(result)
        result.update({"average_metrics":average_metric})
        return result
--- a/examples/examples_prompt/backbones/blenderbot.py
+++ b/examples/examples_prompt/backbones/blenderbot.py
@ -26,14 +26,13 @@ def preprocess_function(raw_example, **kwargs):
    example = InputExample(**raw_example)
-    try:
+   
-        example = verbalizer.wrap_one_example(example)
+    example = verbalizer.wrap_one_example(example)
-        example, other = template.wrap_one_example(example)
+    example, other = template.wrap_one_example(example)
-        input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
+    input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
-        model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
+    model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
-                            padding="max_length", truncation=True)
+                        padding="max_length", truncation=True)
-    except:
+
        from IPython import embed; embed(header="Therer")
    with tokenizer.as_target_tokenizer():
        label = tokenizer(other['tgt_text']).input_ids
@ -165,7 +164,7 @@ class Trainer(HfSeq2SeqTrainer):
        return (loss, generated_tokens, labels)
    def _compute_metrics(self, eval_preds):
-        from IPython import embed; embed(header="In compute metrics")
+        # from IPython import embed; embed(header="In compute metrics")
        preds, labels = eval_preds
        decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
        decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
--- a/examples/examples_prompt/backbones/opt.py
+++ b/examples/examples_prompt/backbones/opt.py
@ -0,0 +1,171 @@
 from openpromptu.data_utils import InputExample
 import torch
 from transformers.data.data_collator import torch_default_data_collator
 from transformers.data.data_collator import DataCollatorMixin as HfDataCollatorMixin
 from transformers.data.data_collator import DataCollatorForSeq2Seq as DataCollator
 import numpy as np
 from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
 )
 from transformers import Seq2SeqTrainer as HfSeq2SeqTrainer
 import copy
 from torch.nn import CrossEntropyLoss
 def preprocess_function(raw_example, **kwargs):
    tokenizer = kwargs['tokenizer']
    data_args = kwargs['data_args']
    template = kwargs['template']
    verbalizer = kwargs['verbalizer']
    tokenizer_wrapper = kwargs['tokenizer_wrapper']
    example = InputExample(**raw_example)
    # example = verbalizer.wrap_one_example(example)
    example, other = template.wrap_one_example(example)
    input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
    model_inputs = tokenizer(input_sentence, max_length=data_args.max_source_length,
                        padding="max_length", truncation=True)
    return model_inputs
 def compute_metrics(eval_preds, dataset_name, eval_metric):
    pass
 def mask_token_func(tokenizer, ith_mask=0):
    return tokenizer.pad_token
 def get_remove_columns(dataset_features):
    # dataset_features.remove("label")
    return dataset_features
 def get_prompts(task, tokenizer, data_args, template_id="0", verbalizer_id="0"):
    from openpromptu.prompts import GenerationVerbalizer
    from openpromptu.prompts import ManualTemplate
    from openpromptu import TokenizerWrapper
    template = ManualTemplate(text = task.templates_text[template_id])
    verbalizer = GenerationVerbalizer(tokenizer=tokenizer, classes = None, label_words=None)
    tokenizer_wrapper = TokenizerWrapper(max_seq_length=data_args.max_source_length, tokenizer=tokenizer, truncate_method="tail", mask_token_func=mask_token_func)
    return template, verbalizer, tokenizer_wrapper
 def get_backbone(model_args, **kwargs):
    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    # config.dropout_rate = 0.0
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    if not hasattr(tokenizer,"pad_token") or (hasattr(tokenizer,"pad_token") and tokenizer.pad_token==None):
        tokenizer.pad_token = tokenizer.eos_token
    model = AutoModelForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
        )
    return config, tokenizer, model
 class Trainer(HfSeq2SeqTrainer):
    def __init__(self, verbalizer=None, eval_task=None, **kwargs):
        super().__init__(**kwargs)
        self.eval_task = eval_task
        self.compute_metrics = self._compute_metrics
    def compute_loss(self, model, inputs, return_outputs=False):
        labels=copy.deepcopy(inputs['input_ids'])
        # labels[labels==self.tokenizer.pad_token_id]=-100
        outputs = model(**inputs)
        logits = outputs.logits
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()
        loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
        loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.long().view(-1))
        return (loss, outputs) if return_outputs else loss
    def prediction_step(
        self,
        model, #nn.Module,
        inputs, #Dict[str, Union[torch.Tensor, Any]],
        prediction_loss_only, #: bool,
        ignore_keys, #: Optional[List[str]] = None,
    ): #-> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
        """
        Perform an evaluation step on :obj:`model` using obj:`inputs`.
        Subclass and override to inject custom behavior.
        Args:
            model (:obj:`nn.Module`):
                The model to evaluate.
            inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
                The inputs and targets of the model.
                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
                argument :obj:`labels`. Check your model's documentation for all accepted arguments.
            prediction_loss_only (:obj:`bool`):
                Whether or not to return the loss only.
        Return:
            Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
            labels (each being optional).
        """
        if not self.args.predict_with_generate or prediction_loss_only:
            return super().prediction_step(
                model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
            )
        inputs = self._prepare_inputs(inputs)
        with torch.no_grad():
            labels=copy.deepcopy(inputs['input_ids'])
            # labels[labels==self.tokenizer.pad_token_id]=-100
            outputs = model(**inputs)
            logits = outputs.logits
            shift_logits = logits[..., :-1, :].contiguous()
            shift_labels = labels[..., 1:].contiguous().long()
            loss_fct = CrossEntropyLoss(ignore_index=self.tokenizer.pad_token_id)
            loss = loss_fct(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1)).detach().cpu()
            loss = torch.where(torch.isnan(loss), torch.full_like(loss, 0), loss)
        if prediction_loss_only:
            return (loss, None, None)
        else:
            # non pad label
            shift_labels = shift_labels.view(-1).detach().cpu()
            nonpad_idx = shift_labels!=self.tokenizer.pad_token_id
            shift_labels = shift_labels[nonpad_idx]
            # the probability at the corresponding position
            shift_logits = shift_logits.view(-1, shift_logits.shape[-1])[nonpad_idx].detach().cpu()
            target_position = torch.nn.functional.one_hot(shift_labels,shift_logits.shape[-1]).bool().to(shift_labels.device)
            shift_logits = shift_logits.softmax(dim=-1)[target_position]
            return (loss, shift_logits, shift_labels)
    def _compute_metrics(self, eval_preds):
        preds, labels = eval_preds
        result = {}
        for metric in self.eval_task.metric:
            result.update(metric(preds, labels,ignore_index=self.tokenizer.pad_token_id))
        average_metric = sum(result.values())/len(result)
        result.update({"average_metrics":average_metric})
        return result
--- a/examples/examples_prompt/backbones/t5.py
+++ b/examples/examples_prompt/backbones/t5.py
@ -26,14 +26,13 @@ def preprocess_function(raw_example, **kwargs):
    example = InputExample(**raw_example)
-    try:
+ 
-        example = verbalizer.wrap_one_example(example)
+    example = verbalizer.wrap_one_example(example)
-        example, other = template.wrap_one_example(example)
+    example, other = template.wrap_one_example(example)
-        input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
+    input_sentence = tokenizer_wrapper.merge_wrapped_example(example)
-        model_inputs = tokenizer(input_sentence, max_length=256,
+    model_inputs = tokenizer(input_sentence, max_length=256,
-                            padding="max_length", truncation=True)
+                        padding="max_length", truncation=True)
-    except:
+
        from IPython import embed; embed(header="Therer")
    with tokenizer.as_target_tokenizer():
        label = tokenizer(other['tgt_text']).input_ids
--- a/examples/examples_prompt/backbones/vit.py
+++ b/examples/examples_prompt/backbones/vit.py
--- a/examples/examples_prompt/collect_result.jsonl
+++ b/examples/examples_prompt/collect_result.jsonl
@ -1,59 +0,0 @@
 # the final results will be populated here.{
    "evaluate": {
        "epoch": 20.0,
        "eval_accuracy": 89.2156862745098,
        "eval_average_metrics": 90.76168929110105,
        "eval_f1": 92.3076923076923,
        "eval_loss": 0.16493959724903107,
        "eval_runtime": 1.6391,
        "eval_samples_per_second": 124.455
    },
    "repo_name": "DeltaHub/bitfit_t5-base_mrpc",
    "test": {
        "epoch": 20.0,
        "test_accuracy": 88.23529411764706,
        "test_average_metrics": 89.97971602434077,
        "test_f1": 91.72413793103448,
        "test_loss": 0.14968213438987732,
        "test_runtime": 1.6344,
        "test_samples_per_second": 124.82
    }
 }
 {
    "evaluate": {
        "epoch": 20.0,
        "eval_average_metrics": 52.10265668831534,
        "eval_loss": 0.3603779077529907,
        "eval_matthews_correlation": 52.10265668831534,
        "eval_runtime": 1.0808,
        "eval_samples_per_second": 482.046
    },
    "repo_name": "DeltaHub/bitfit_t5-base_cola",
    "test": {
        "epoch": 20.0,
        "test_average_metrics": 54.209563471221934,
        "test_loss": 0.2853100299835205,
        "test_matthews_correlation": 54.209563471221934,
        "test_runtime": 1.056,
        "test_samples_per_second": 494.304
    }
 }
 {
    "evaluate": {
        "epoch": 20.0,
        "eval_average_metrics": 53.80613287067274,
        "eval_loss": 0.25723716616630554,
        "eval_matthews_correlation": 53.80613287067274,
        "eval_runtime": 1.0583,
        "eval_samples_per_second": 492.299
    },
    "repo_name": "DeltaHub/bitfit_t5-base_cola",
    "test": {
        "epoch": 20.0,
        "test_average_metrics": 54.32497579543861,
        "test_loss": 0.22327613830566406,
        "test_matthews_correlation": 54.32497579543861,
        "test_runtime": 1.0556,
        "test_samples_per_second": 494.507
    }
 }
--- a/examples/examples_prompt/configs/adapter_clip-vit-base-patch32/beans.json
+++ b/examples/examples_prompt/configs/adapter_clip-vit-base-patch32/beans.json
@ -0,0 +1,48 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "beans",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/clip-vit-base-patch32",
    "num_classes": 3,
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/clip-vit-base-patch32/beans",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_delta_center": true,
    "push_to_hub": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "beans",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "beans",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/clip-vit-base-patch32",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_prompt/configs/adapter_opt-350m/wikitext.json
+++ b/examples/examples_prompt/configs/adapter_opt-350m/wikitext.json
@ -0,0 +1,53 @@
 {
    "backbone_model": "opt",
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "wikitext",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "gradient_accumulation_steps":2,
    "greater_is_better": false,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 900,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/opt-350m",
    "model_path_public": "opt-350m",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/opt-350m/wikitext",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 6,
    "per_device_train_batch_size": 6,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "wikitext",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "wikitext",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/opt-350m",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["self_attn"]
 }
--- a/examples/examples_prompt/configs/adapter_vit-large-patch16-224-in21k/beans.json
+++ b/examples/examples_prompt/configs/adapter_vit-large-patch16-224-in21k/beans.json
@ -0,0 +1,53 @@
 {
    "backbone_model": "vit",
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": false,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "beans",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/vit-large-patch16-224-in21k",
    "model_path_public": "vit-large-patch16-224-in21k",
    "num_classes": 3,
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/vit-large-patch16-224-in21k/beans",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": false,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "beans",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "beans",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/vit-large-patch16-224-in21k",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["output"]
 }
--- a/examples/examples_prompt/configs/bitfit_t5-large/rte.json
+++ b/examples/examples_prompt/configs/bitfit_t5-large/rte.json
@ -0,0 +1,51 @@
 {
    "backbone_model": "t5-large",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "bitfit",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "rte",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/t5-large",
    "model_path_public": "t5-large",
    "num_train_epochs": 20,
    "output_dir": "outputs/bitfit/t5-large/rte",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "rte",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "rte",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/t5-large",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["attn", "ff", "layer_norm"]
 }
--- a/examples/examples_prompt/configs/compacter_blenderbot-3b/sst2.json
+++ b/examples/examples_prompt/configs/compacter_blenderbot-3b/sst2.json
@ -0,0 +1,66 @@
 {
    "backbone_model": "blenderbot",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "compacter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "sst2",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "factorized_phm": true,
    "factorized_phm_rule": false,
    "gradient_clip": false,
    "greater_is_better": true,
    "hypercomplex_adapters": true,
    "hypercomplex_division": 4,
    "hypercomplex_nonlinearity": "glorot-uniform",
    "learn_phm": true,
    "learning_rate": 0.003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/blenderbot-3b",
    "model_path_public": "blenderbot-3b",
    "non_linearity": "gelu_new",
    "normalize_phm_weight": false,
    "num_train_epochs": 3,
    "output_dir": "outputs/compacter/blenderbot-3b/sst2",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "phm_c_init": "normal",
    "phm_clamp": false,
    "phm_init_range": 0.0001,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "shared_phm_rule": false,
    "split_validation_test": true,
    "task_name": "sst2",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "sst2",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/blenderbot-3b",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "use_bias_down_sampler": true,
    "use_bias_up_sampler": true,
    "warmup_steps": 0,
    "modified_modules":["fc2"]
 }
--- a/examples/examples_prompt/configs/compacter_deberta-v2-xlarge/mnli.json
+++ b/examples/examples_prompt/configs/compacter_deberta-v2-xlarge/mnli.json
@ -0,0 +1,51 @@
 {
    "backbone_model": "deberta-v2-xlarge",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "compacter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "mnli",
    "eval_steps": 500,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "is_seq2seq": false,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/deberta-v2-xlarge",
    "num_train_epochs": 3,
    "output_dir": "outputs/compacter/deberta-v2-xlarge/mnli",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": false,
    "push_to_dc": true,
    "push_to_hub": false,
    "save_steps": 500,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "mnli",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "mnli",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/deberta-v2-xlarge",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["attention"]
 }
--- a/examples/examples_prompt/configs/compacter_long-t5-tglobal-large/rte.json
+++ b/examples/examples_prompt/configs/compacter_long-t5-tglobal-large/rte.json
@ -0,0 +1,51 @@
 {
    "backbone_model": "long-t5",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "compacter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "rte",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/long-t5-tglobal-large",
    "model_path_public": "long-t5-tglobal-large",
    "num_train_epochs": 20,
    "output_dir": "outputs/compacter/long-t5-tglobal-large/rte",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "rte",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "rte",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/long-t5-tglobal-large",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["attn", "ff", "layer_norm"]
 }
--- a/examples/examples_prompt/configs/gen_bart.py
+++ b/examples/examples_prompt/configs/gen_bart.py
@ -71,8 +71,21 @@ AllConfigs['adapter_bart-base'].update({
                                "output_dir": "outputs/adapter/bart-base/",
                            })
-AllConfigs['lora_bart-base'] = copy.deepcopy(BaseConfigs['bart-base'])
+AllConfigs['parallel_adapter_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
-AllConfigs['lora_bart-base'].update({
+AllConfigs['parallel_adapter_t5-base'].update({
                                "delta_type": "parallel_adapter",
                                "learning_rate": 3e-4,
                                "unfrozen_modules": [
                                    "deltas",
                                    "layer_norm",
                                    "final_layer_norm"
                                ],
                                "bottleneck_dim":24,
                                "output_dir": "outputs/parallel_adapter/t5-base/",
                            })
 AllConfigs['lora_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
 AllConfigs['lora_t5-base'].update({
                                "delta_type": "lora",
                                "learning_rate": 3e-4,
                                "unfrozen_modules": [
--- a/examples/examples_prompt/configs/gen_clip.py
+++ b/examples/examples_prompt/configs/gen_clip.py
@ -2,7 +2,7 @@ import collections
 import copy
 PATHBASE="/mnt/sfs_turbo/hsd/plm_cache/"
-PATHBASE="/home/hushengding/plm_cache/"
+# PATHBASE="/home/hushengding/plm_cache/"
 AllConfigs = {}
--- a/examples/examples_prompt/configs/gen_t5.py
+++ b/examples/examples_prompt/configs/gen_t5.py
@ -45,11 +45,14 @@ BaseConfigs['t5-base'] = {
                "greater_is_better": True,
                "evaluation_strategy": "steps",
                "overwrite_output_dir": True,
-                "push_to_hub": False,
+                "push_to_hf": False,
-                "push_to_delta_center": True,
+                "push_to_dc": True,
                "save_strategy": "steps",
                "datasets_load_from_disk": True,
-                "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/"
+                "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
                "backbone_model": "t5", # use in delta center,
                "model_path_public": "t5-base", # use in delta center,
            }
 AllConfigs['bitfit_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
--- a/examples/examples_prompt/configs/lora_beit-large-patch16-224/cifar10.json
+++ b/examples/examples_prompt/configs/lora_beit-large-patch16-224/cifar10.json
@ -0,0 +1,52 @@
 {
    "backbone_model": "beit",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk",
    "delta_type": "lora",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "cifar10",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/beit-large-patch16-224",
    "model_path_public": "beit-large-patch16-224",
    "num_classes": 10,
    "num_train_epochs": 20,
    "output_dir": "outputs/lora/beit-large-patch16-224/cifar10",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": false,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "cifar10",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "cifar10",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/beit-large-patch16-224",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["query","value"]
 }
--- a/examples/examples_prompt/configs/lora_gpt-j-6B/wikitext.json
+++ b/examples/examples_prompt/configs/lora_gpt-j-6B/wikitext.json
@ -0,0 +1,52 @@
 {
    "backbone_model": "gpt-j",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "lora",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "wikitext",
    "eval_steps": 500,
    "evaluation_strategy": "steps",
    "gradient_accumulation_steps":4,
    "greater_is_better": false,
    "learning_rate": 0.00003,
    "load_best_model_at_end": true,
    "max_source_length": 512,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/gpt-j-6B",
    "model_path_public": "gpt-j-6B",
    "num_train_epochs": 2,
    "output_dir": "outputs/lora/gpt-j-6B/wikitext",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 2,
    "per_device_train_batch_size": 2,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 500,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "wikitext",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "wikitext",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/gpt-j-6B",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["20.attn.q_proj","21.attn.q_proj","22.attn.q_proj","23.attn.q_proj","24.attn.q_proj","25.attn.q_proj","26.attn.q_proj","27.attn.q_proj"]
 }
--- a/examples/examples_prompt/configs/lora_roberta-large/superglue-boolq.json
+++ b/examples/examples_prompt/configs/lora_roberta-large/superglue-boolq.json
@ -0,0 +1,52 @@
 {
    "backbone_model": "roberta-large",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "lora",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-boolq",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "is_seq2seq": false,
    "learning_rate": 0.0001,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/roberta-large",
    "model_path_public": "roberta-large",
    "num_train_epochs": 20,
    "output_dir": "outputs/lora/roberta-large/superglue-boolq",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": false,
    "push_to_hub": false,
    "push_to_dc": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "superglue-boolq",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-boolq",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/roberta-large",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["query","value"]
 }
--- a/examples/examples_prompt/configs/lora_xlm-roberta-large/superglue-wic.json
+++ b/examples/examples_prompt/configs/lora_xlm-roberta-large/superglue-wic.json
@ -0,0 +1,52 @@
 {
    "backbone_model": "xlm-roberta-large",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "lora",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-wic",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "is_seq2seq": false,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/xlm-roberta-large",
    "model_path_public": "xlm-roberta-large",
    "num_train_epochs": 20,
    "output_dir": "outputs/lora/xlm-roberta-large/superglue-wic",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": false,
    "push_to_dc": true,
    "push_to_hub": false,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "superglue-wic",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-wic",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/xlm-roberta-large",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["query","value"]
 }
--- a/examples/examples_prompt/configs/low_rank_adapter_gpt2/wikitext.json
+++ b/examples/examples_prompt/configs/low_rank_adapter_gpt2/wikitext.json
@ -0,0 +1,52 @@
 {
    "backbone_model": "gpt2",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "low_rank_adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "wikitext",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "gradient_accumulation_steps":1,
    "greater_is_better": false,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 768,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/gpt2",
    "model_path_public": "gpt2",
    "num_train_epochs": 2,
    "output_dir": "outputs/low_rank_adapter/gpt2/wikitext",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "wikitext",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "wikitext",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/gpt2",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["attn","mlp"]
 }
--- a/examples/examples_prompt/configs/prefix_bert-large-cased/rte.json
+++ b/examples/examples_prompt/configs/prefix_bert-large-cased/rte.json
@ -0,0 +1,51 @@
 {
    "backbone_model": "bert-large-cased",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "prefix",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "rte",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "is_seq2seq": false,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/bert-large-cased",
    "num_train_epochs": 20,
    "output_dir": "outputs/prefix/bert-large-cased/rte",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": false,
    "push_to_dc": true,
    "push_to_hub": false,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "split_validation_test": true,
    "task_name": "rte",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "rte",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/bert-large-cased",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm"
    ],
    "warmup_steps": 0,
    "modified_modules":["attention"]
 }
--- a/examples/examples_prompt/configs/soft_prompt_bart-large/superglue-boolq.json
+++ b/examples/examples_prompt/configs/soft_prompt_bart-large/superglue-boolq.json
@ -0,0 +1,51 @@
 {
    "backbone_model": "bart",
    "dataset_config_name": [
        "en"
    ],
    "datasets_load_from_disk": true,
    "datasets_saved_path": "/mnt/sfs_turbo/hsd/huggingface_datasets/saved_to_disk/",
    "delta_type": "soft_prompt",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-boolq",
    "eval_steps": 500,
    "evaluation_strategy": "steps",
    "gradient_accumulation_steps":1,
    "greater_is_better": true,
    "learning_rate": 0.1,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "average_metrics",
    "model_name_or_path": "/mnt/sfs_turbo/hsd/plm_cache/bart-large",
    "model_path_public": "bart-large",
    "num_train_epochs": 50,
    "output_dir": "outputs/soft_prompt/bart-large/superglue-boolq",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_dc": true,
    "push_to_hf": false,
    "save_steps": 500,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "soft_token_num":100,
    "split_validation_test": true,
    "task_name": "superglue-boolq",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-boolq",
    "tokenizer_name": "/mnt/sfs_turbo/hsd/plm_cache/bart-large",
    "token_init": true,
    "unfrozen_modules": [
        "deltas"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_prompt/data_processors/processor.py
+++ b/examples/examples_prompt/data_processors/processor.py
@ -93,4 +93,10 @@ class AbstractTask(abc.ABC):
            # shuffles the data and samples it.
            if n_obs is not None:
                dataset = self.subsample(dataset, n_obs)
-        return dataset.map(self.preprocessor)
+
        this_method = getattr(self.__class__, 'preprocessor')
        base_method = getattr(AbstractTask, 'preprocessor')
        if this_method is not base_method:
            return dataset.map(self.preprocessor)
        else:
            return dataset
--- a/examples/examples_prompt/data_processors/tasks.py
+++ b/examples/examples_prompt/data_processors/tasks.py
@ -12,22 +12,16 @@ import logging
 import numpy as np
 import torch
 import re
 from openprompt.prompts import ManualTemplate, ManualVerbalizer
 from openprompt.plms.utils import TokenizerWrapper
 from openprompt.data_utils import InputExample
 from openprompt.prompts import GenerationVerbalizer
 import itertools
-
+import os
 logger = logging.getLogger(__name__)
 from transformers.models.auto.tokenization_auto import tokenizer_class_from_name
 from typing import List, Dict
 from collections import defaultdict
 from openprompt.utils import round_list
 import warnings
@ -68,7 +62,8 @@ class COLA(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.cola")[split]
        else:
            return datasets.load_dataset('glue', 'cola',
@ -96,7 +91,8 @@ class SST2(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.sst2")[split]
        else:
            return datasets.load_dataset('glue', 'sst2',
@ -123,10 +119,9 @@ class MRPC(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mrpc")[split]
        else:
            return datasets.load_dataset('glue', 'mrpc', split=split, script_version="master")
@ -152,7 +147,8 @@ class QQP(AbstractTask):
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qqp")[split]
        else:
            return datasets.load_dataset('glue', 'qqp',
@ -208,7 +204,8 @@ class MNLI(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.mnli")[split]
        else:
            return datasets.load_dataset('glue', 'mnli', split=split, script_version="master")
@ -243,7 +240,8 @@ class QNLI(AbstractTask):
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.qnli")[split]
        else:
            return datasets.load_dataset('glue', 'qnli', split=split, script_version="master")
@ -279,7 +277,8 @@ class RTE(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.rte")[split]
        else:
            return datasets.load_dataset('glue', 'rte',
@ -306,7 +305,8 @@ class WNLI(AbstractTask):
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/glue.wnli")[split]
        else:
            return datasets.load_dataset('glue', 'wnli', split=split, script_version="master")
@ -334,7 +334,8 @@ class SuperGLUEBoolQ(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.boolq")[split]
        else:
            return datasets.load_dataset('super_glue', 'boolq', split=split, script_version="master")
@ -347,8 +348,8 @@ class SuperGLUECB(AbstractTask):
    split_to_data_split = {"train": "train",
                           "validation": "validation",
                           "test": "validation"}
-    metric = [metrics.mean_multiclass_f1(num_classes=3), metrics.accuracy]
+    metric = [metrics.accuracy]
-    metric_names = ["f1_multiclass", "accuracy"]
+    metric_names = ["accuracy"]
    verbalizers = {
        "0":{"0": "yes",
@ -361,7 +362,8 @@ class SuperGLUECB(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.cb")[split]
        else:
            return datasets.load_dataset('super_glue', 'cb', split=split, script_version="master")
@ -387,7 +389,8 @@ class SuperGLUECOPA(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.copa")[split]
        else:
            return datasets.load_dataset('super_glue', 'copa', split=split, script_version="master")
@ -416,7 +419,8 @@ class SuperGLUEMultiRC(AbstractTask):
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.multirc")[split]
        else:
            return datasets.load_dataset('super_glue', 'multirc', split=split, script_version="master")
@ -459,7 +463,8 @@ class SuperGLUEWIC(AbstractTask):
    }
    def load_dataset(self, split):
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.wic")[split]
        else:
            return datasets.load_dataset('super_glue', 'wic', split=split, script_version="master")
@ -549,13 +554,76 @@ class Beans(AbstractTask):
    def load_dataset(self, split):
        # from IPython import embed; embed(header="beans")
-        if self.data_args.datasets_load_from_disk:
+        offline = os.environ.get("HF_DATASETS_OFFLINE", "0")
-            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/super_glue.wic")[split]
+        if offline == '1':
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/beans")[split]
        else:
            return datasets.load_dataset('beans', split=split, script_version="master")
 class Wikitext(AbstractTask):
    #wikitext-2-v1
    name = "wikitext"
    # labels_list = ['angular_leaf_spot', 'bean_rust', "healthy"]
    split_to_data_split = {"train": "train",
                           "validation": "validation",
                           "test": "validation"}
    metric = [metrics.perplexity]
    metric_names = ["perplexity"]
    verbalizers = {
        "0": {
        }
    }
    templates_text = {
        "0": """{"meta":"text"}"""
    }
    split_valid_to_make_test = True
    def load_dataset(self, split):
        # from IPython import embed; embed(header="beans")
        if self.data_args.datasets_load_from_disk:
            return datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/wikitext")[split]
        else:
            return datasets.load_dataset('wikitext','wikitext-2-v1', split=split, script_version="master")
 class Cifar10(AbstractTask):
    name = "cifar10"
    split_to_data_split = {"train": "train",
                           "validation": "test",
                           "test": "test"}
    metric = [metrics.accuracy]
    metric_names = ["accuracy"]
    def load_dataset(self, split):
        if self.data_args.datasets_load_from_disk:
            d = datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/cifar10")[split].select(range(100))
            print(d)
            return d
        else:
            return datasets.load_dataset('cifar10', split=split, script_version="master")
    # def preprocessor(self, example):
    #     example_ = {}
    #     example_["image"] = example["image"]
    #     example_["labels"] = example["label"]
    #     return example_
 class Fashion_MNIST(AbstractTask):
    name = "Fashion-MNIST"
    split_to_data_split = {"train": "train",
                           "validation": "test",
                           "test": "test"}
    metric = [metrics.accuracy]
    metric_names = ["accuracy"]
    def load_dataset(self, split):
        if self.data_args.datasets_load_from_disk:
            d = datasets.load_from_disk(f"{self.data_args.datasets_saved_path}/fashion_mnist")[split]
            print(d)
            return d
        else:
            return datasets.load_dataset('fashion_mnist', split=split, script_version="master")
 TASK_MAPPING = OrderedDict(
    [
@ -575,7 +643,10 @@ TASK_MAPPING = OrderedDict(
        ('superglue-multirc', SuperGLUEMultiRC),
        ('superglue-wic', SuperGLUEWIC),
        # ('superglue-record', SuperGLUERecord)
-        ('beans', Beans)
+        ('beans', Beans),
        ('wikitext',Wikitext),
        ('cifar10',Cifar10),
        ('fashion_mnist',Fashion_MNIST)
    ]
 )
--- a/examples/examples_prompt/metrics/metrics.py
+++ b/examples/examples_prompt/metrics/metrics.py
@ -11,6 +11,14 @@ import sklearn.metrics
 logger = getLogger(__name__)
 def perplexity(outputs, targets,ignore_index=-100):
    """Computes the perplexity accuracy."""
    ce = -np.log(outputs).mean()
    # ce = F.cross_entropy(torch.Tensor(outputs).view(-1, outputs.shape[-1]), torch.Tensor(targets).view(-1).long(),ignore_index=ignore_index)
    return {"perplexity":float(np.exp(ce))}
 def accuracy(predictions, targets) -> dict:
    """Computes the average accuracy."""
    return {"accuracy": 100 * ((np.array(predictions) == np.array(targets)).mean())}
@ -47,20 +55,20 @@ def spearman_corrcoef(predictions, targets) -> dict:
-def spearman_corrcoef(predictions, targets) -> dict:
+# def spearman_corrcoef(predictions, targets) -> dict:
-    """Computes Spearman correlation coefficient."""
+#     """Computes Spearman correlation coefficient."""
-    # TODO: we need to do postprocessors in a clean way for each dataset.
+#     # TODO: we need to do postprocessors in a clean way for each dataset.
-    from examples_seq2seq.data_processors.postprocessors import string_to_float
+#     from examples_seq2seq.data_processors.postprocessors import string_to_float
-    targets = [string_to_float(target) for target in targets]
+#     targets = [string_to_float(target) for target in targets]
-    predictions= [string_to_float(prediction) for prediction in predictions]
+#     predictions= [string_to_float(prediction) for prediction in predictions]
-    spearman_corrcoef = 100 * scipy.stats.spearmanr(targets, predictions)[0]
+#     spearman_corrcoef = 100 * scipy.stats.spearmanr(targets, predictions)[0]
-    # Note that if all the predictions will be the same, spearman
+#     # Note that if all the predictions will be the same, spearman
-    # correlation is nan, to gaurad against this, we check the output
+#     # correlation is nan, to gaurad against this, we check the output
-    # and return 0 in this case.
+#     # and return 0 in this case.
-    if math.isnan(spearman_corrcoef):
+#     if math.isnan(spearman_corrcoef):
-        spearman_corrcoef = 0
+#         spearman_corrcoef = 0
-    return {"spearmanr": spearman_corrcoef}
+#     return {"spearmanr": spearman_corrcoef}
 def f1_score_with_invalid(predictions, targets) -> dict:
@ -102,8 +110,8 @@ def f1_score(predictions, targets) -> dict:
    Returns:
      F1 score, where any prediction != 0 or 1 is counted as wrong.
    """
-    targets = targets.astype(np.int32)
+    targets = np.array(targets).astype(np.int32)
-    predictions = predictions.astype(np.int32)
+    predictions = np.array(predictions).astype(np.int32)
    return {"f1": 100 * sklearn.metrics.f1_score(targets, predictions)}
 # TODO: maybe gaurd against invalid values https://stackoverflow.com/questions/56865344/how-do-i-calculate-the-matthews-correlation-coefficient-in-tensorflow
--- a/examples/examples_prompt/src/run.py
+++ b/examples/examples_prompt/src/run.py
@ -26,10 +26,12 @@ You can also adapt this script on your own tasks.
 import os
 import sys
 os.environ['MKL_THREADING_LAYER'] = 'GNU'
 os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'
 os.environ["TOKENIZERS_PARALLELISM"] = "false"
 sys.path.append(os.path.join(os.getcwd(), "../"))
 # sys.path.append(os.path.join(os.getcwd(), "/mnt/sfs_turbo/zhangzhen/OpenDelta"))
 sys.path.append(os.path.join(os.getcwd()))
 import functools
@ -56,7 +58,7 @@ from transformers.trainer_utils import is_main_process, get_last_checkpoint
 from data_processors import AutoTask #, #TaskDataCollatorForSeq2Seq, AutoPostProcessor, data_collator
 from utils import read_json, save_json
-from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, RemainArgHfArgumentParser
+from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, DeltaArguments, RemainArgHfArgumentParser
 logger = logging.getLogger(__name__)
@ -66,16 +68,14 @@ def main():
    # See all possible arguments in src/transformers/training_args.py
    # or by passing the --help flag to this script.
    # We now keep distinct sets of args, for a cleaner separation of concerns.
-    parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
+    parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, DeltaArguments))
    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
        # If we pass only one argument to the script and it's the path to a json file,
        # let's parse it to get our arguments.
        model_args, data_args, training_args, delta_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        model_args, data_args, training_args, delta_args = parser.parse_args_into_dataclasses(return_remaining_strings=True)
    # You can provide a json file with contains the arguments and use the --argument some_arg to override or append to  the json file.
    json_file, cmd_args = (os.path.abspath(sys.argv[1]), sys.argv[2:]) if sys.argv[1].endswith(".json") else (None, sys.argv[1:])
    model_args, data_args, training_args, delta_args, remain_args = parser.parse_json_file_with_cmd_args(json_file=json_file, command_line_args=cmd_args)
    logger.warning("The following arguments not used! {}".format(remain_args))
-    print(f"{training_args.output_dir}/results.json")
+    logger.info(f"The results will be used in {training_args.output_dir}/results.json")
    # exit()
    # Detecting last checkpoint.
    last_checkpoint = None
@ -121,7 +121,8 @@ def main():
-    if os.path.basename(model_args.model_name_or_path).startswith("t5"):
+    if os.path.basename(model_args.model_name_or_path).startswith("t5") \
        or os.path.basename(model_args.model_name_or_path).startswith("long-t5") :
        from examples_prompt.backbones.t5 import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.t5 import Trainer, DataCollator
    elif  os.path.basename(model_args.model_name_or_path).startswith("blenderbot"):
@ -129,7 +130,9 @@ def main():
        from examples_prompt.backbones.blenderbot import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("roberta") \
        or os.path.basename(model_args.model_name_or_path).startswith("bert") \
-          or os.path.basename(model_args.model_name_or_path).startswith("albert") :
+          or os.path.basename(model_args.model_name_or_path).startswith("albert") \
            or os.path.basename(model_args.model_name_or_path).startswith("xlm-roberta") \
                or os.path.basename(model_args.model_name_or_path).startswith("deberta") :
        from examples_prompt.backbones.bert import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.bert import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("beit"):
@ -144,6 +147,10 @@ def main():
    elif os.path.basename(model_args.model_name_or_path).startswith("clip"):
        from examples_prompt.backbones.clip import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.clip import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("opt") \
        or os.path.basename(model_args.model_name_or_path).startswith("gpt"):
        from examples_prompt.backbones.opt import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.opt import Trainer, DataCollator
@ -161,7 +168,8 @@ def main():
    if delta_args.delta_type.lower() != "none":
        from opendelta import AutoDeltaConfig,AutoDeltaModel
-        delta_config = AutoDeltaConfig.from_dict(vars(delta_args))
+        from dataclasses import asdict
        delta_config = AutoDeltaConfig.from_dict(asdict(delta_args))
        delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=model)
        delta_model.freeze_module(set_state_dict = True)
        delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
@ -278,14 +286,9 @@ def main():
    if torch.cuda.is_available() and training_args.compute_memory:
        peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000
        print(
            "Memory utilization",
            peak_memory,
            "GB"
        )
        performance_metrics.update({"peak_memory": peak_memory})
    if training_args.compute_memory or training_args.compute_time:
-        print("Efficiency Statistics {}".format(performance_metrics))
+        logger.info("Efficiency Statistics {}".format(performance_metrics))
        trainer.save_metrics("performance", performance_metrics)
    # Evaluation
@ -313,17 +316,30 @@ def main():
        trainer.save_metrics(f"{data_args.task_name}_test", metrics)
        all_results['test'][data_args.task_name] = metrics
    # from opendelta.utils.delta_hub import create_hub_repo_name
    # from opendelta.utils.delta_center import create_delta_center_args, create_repo_name
    # repo_name = create_hub_repo_name(root="DeltaHub",
    #                      dataset=data_args.task_name,
    #                      delta_type = delta_args.delta_type,
    #                      model_name_or_path= model_args.model_name_or_path)
-    # results['repo_name'] = repo_name
+
-    # if delta_args.delta_type.lower() != "none":
+    # center_args =
-    #     if training_args.push_to_hub: # TODO add description here
+    # repo_name = create_repo_name(prefix="", center_args=center_args)
-    #         delta_model.save_finetuned(push_to_hub=True, save_directory=repo_name, use_auth_token=True)
+    # all_results['repo_name'] = repo_name
-    #         # trainer.push_to_hub(**kwargs)
+
-    #     else:
+
-    #         delta_model.save_finetuned(push_to_hub=False, save_directory=repo_name, use_auth_token=True)
+    delta_model.save_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path,
                               push_to_dc=training_args.push_to_dc,
                               center_args={"test_performance":all_results['test'][data_args.task_name]['test_average_metrics'],
                                            },
                               center_args_pool = {**vars(model_args), **vars(data_args), **vars(training_args), **vars(delta_args)},
                               list_tags = ['NLI'],
                               dict_tags = {'purpose':'for testing'},
                               delay_push=True,
                               test_result=all_results['test']
                            )
    with open(f"{training_args.output_dir}/results.json", 'w') as fout:
--- a/examples/examples_prompt/src/test.py
+++ b/examples/examples_prompt/src/test.py
@ -0,0 +1,344 @@
 # coding=utf-8
 # Copyright OpenDelta Team and THUNLP lab. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 A unified runing scripts for most models to do down stream tasks in a
 prompt learning fashion, i.e., No classification head, all tasks are casted
 to mask prediction or span prediction tasks.
 Processing relevant to different backbone models are stored in ../backbones/
 Adding A few lines to integrate the Delta tuning methods.
 You can also adapt this script on your own tasks.
 """
 import os
 import sys
 os.environ['MKL_THREADING_LAYER'] = 'GNU'
 os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'
 os.environ["TOKENIZERS_PARALLELISM"] = "false"
 sys.path.append(os.path.join(os.getcwd(), "../"))
 sys.path.append(os.path.join(os.getcwd()))
 import functools
 import logging
 import torch
 import json
 import numpy as np
 import transformers
 from transformers import (
    AutoConfig,
    AutoModelForMaskedLM,
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    DataCollatorForSeq2Seq,
    # HfArgumentParser,
    # MBartTokenizer,
    # default_data_collator,
    Trainer,
    Seq2SeqTrainer,
    set_seed,
 )
 from transformers.trainer_utils import is_main_process, get_last_checkpoint
 from data_processors import AutoTask #, #TaskDataCollatorForSeq2Seq, AutoPostProcessor, data_collator
 from utils import read_json, save_json
 from utils.args import ModelArguments, TrainingArguments, DataTrainingArguments, RemainArgHfArgumentParser, DeltaArguments
 logger = logging.getLogger(__name__)
 def main():
    parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, DeltaArguments))
    # You can provide a json file with contains the arguments and use the --argument some_arg to override or append to  the json file.
    json_file, cmd_args = (os.path.abspath(sys.argv[1]), sys.argv[2:]) if sys.argv[1].endswith(".json") else (None, sys.argv[1:])
    model_args, data_args, training_args, delta_args, remain_args = parser.parse_json_file_with_cmd_args(json_file=json_file, command_line_args=cmd_args)
    logger.warning("The following arguments not used! {}".format(remain_args))
    # # exit()
    # # Detecting last checkpoint.
    # last_checkpoint = None
    # if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
    #     last_checkpoint = get_last_checkpoint(training_args.output_dir)
    #     print("#### last_checkpoint ", last_checkpoint)
    #     if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
    #         '''
    #         raise ValueError(
    #             f"Output directory ({training_args.output_dir}) already exists and is not empty. "
    #             "Use --overwrite_output_dir to overcome."
    #         )
    #         '''
    #         pass
    #     elif last_checkpoint is not None:
    #         logger.info(
    #             f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
    #             "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
    #         )
    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        handlers=[logging.StreamHandler(sys.stdout)],
    )
    logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)
    # Log on each process the small summary:
    logger.warning(
        f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
        + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
    )
    # Set the verbosity to info of the Transformers logger (on main process only):
    if is_main_process(training_args.local_rank):
        transformers.utils.logging.set_verbosity_info()
    # logger.info("Training/evaluation parameters %s", training_args, model_args, data_args, delta_args)
    logger.info("{}\n{}\n{}\n{}".format(training_args, model_args, data_args, delta_args))
    # Set seed before initializing model.
    set_seed(training_args.seed)
    if os.path.basename(model_args.model_name_or_path).startswith("t5"):
        from examples_prompt.backbones.t5 import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.t5 import Trainer, DataCollator
    elif  os.path.basename(model_args.model_name_or_path).startswith("blenderbot"):
        from examples_prompt.backbones.blenderbot import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.blenderbot import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("roberta") \
        or os.path.basename(model_args.model_name_or_path).startswith("bert") \
          or os.path.basename(model_args.model_name_or_path).startswith("albert") :
        from examples_prompt.backbones.bert import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.bert import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("beit"):
        from examples_prompt.backbones.beit import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.beit import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("bart"):
        from examples_prompt.backbones.bart import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.bart import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("bigbird"):
        from examples_prompt.backbones.bigbird import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.bigbird import Trainer, DataCollator
    elif os.path.basename(model_args.model_name_or_path).startswith("clip"):
        from examples_prompt.backbones.clip import get_backbone, preprocess_function, mask_token_func, get_remove_columns, get_prompts
        from examples_prompt.backbones.clip import Trainer, DataCollator
    config, tokenizer, model = get_backbone(model_args=model_args)
    # model parallelize
    if hasattr(training_args, "model_parallel") and training_args.model_parallel:
        logger.info('parallelize model!')
        model.parallelize()
    from opendelta import Visualization
    Visualization(model).structure_graph()
    if delta_args.delta_type.lower() != "none":
        from opendelta.delta_models.adapter import AdapterConfig, AdapterModel
        delta_config = AdapterConfig.from_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path)
        delta_model = AdapterModel.from_finetuned(finetuned_delta_path=delta_args.finetuned_delta_path,
                    delta_config=delta_config,
                    backbone_model=model,
                    force_download=delta_args.force_download,
                    cache_dir=delta_args.delta_cache_dir)
        # delta_model.freeze_module(set_state_dict = True)
        delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
    performance_metrics = {}
    non_empty_splits_names = []
    # if training_args.do_train:
    #     non_empty_splits_names.append("train")
    # if training_args.do_eval:
    #     non_empty_splits_names.append("eval")
    if training_args.do_test:
        non_empty_splits_names.append("test")
    splits = {}
    for split_name in ['test']:
        if split_name not in non_empty_splits_names:
            splits[split_name] = None
            continue
        task = AutoTask.get(data_args.task_name,
                            data_args.dataset_config_name,
                            data_args=data_args,
                            seed=data_args.data_sample_seed)
        dataset =  task.get(split=split_name,
                            split_validation_test=training_args.split_validation_test,
                            n_obs=data_args.max_train_samples)
        template, _verbalizer, tokenizer_wrapper = get_prompts(task, tokenizer, data_args)
        dataset = dataset.map(
                            functools.partial(preprocess_function,
                                            data_args=data_args,
                                            tokenizer=tokenizer,
                                            template=template,
                                            verbalizer=_verbalizer,
                                            tokenizer_wrapper=tokenizer_wrapper,
                                            split=split_name),
                            batched=False,
                            num_proc=data_args.preprocessing_num_workers,
                            remove_columns=get_remove_columns(list(dataset.features.keys())),
                            load_from_cache_file=not data_args.overwrite_cache,
                        )
        # from IPython import embed; embed()
        splits[split_name] = dataset
        if split_name == "test":
            eval_task = task
            verbalizer = _verbalizer
    trainer = Trainer(
        model=model,
        verbalizer=verbalizer,
        eval_task=eval_task,
        args=training_args,
        # train_dataset=splits['train'],
        # eval_dataset=splits['eval'],
        tokenizer=tokenizer,
        data_collator=DataCollator(tokenizer),
    )
    def save_training_config(config_file, output_dir):
        json_data = read_json(config_file)
        save_json(os.path.join(output_dir, "training_config.json"), json_data)
    # Saves training config.
    if trainer.is_world_process_zero():
        save_training_config(sys.argv[1], training_args.output_dir)
    # # Training
    # if training_args.do_train:
    #     checkpoint = None
    #     if training_args.resume_from_checkpoint is not None:
    #         checkpoint = training_args.resume_from_checkpoint
    #     elif last_checkpoint is not None:
    #         checkpoint = last_checkpoint
    #     if training_args.compute_time:
    #         torch.cuda.synchronize()  # wait for move to complete
    #         start = torch.cuda.Event(enable_timing=True)
    #         end = torch.cuda.Event(enable_timing=True)
    #         start.record()
    #     train_result = trainer.train(resume_from_checkpoint=checkpoint)
    #     if training_args.compute_time:
    #         end.record()
    #         torch.cuda.synchronize()  # wait for all_reduce to complete
    #         total_time = start.elapsed_time(end)/(1000*60)
    #         performance_metrics.update({"total_time in minutes ": total_time})
    #     trainer.save_model()  # Saves the tokenizer too for easy upload
    #     train_metrics = train_result.metrics
    #     max_train_samples = (
    #         data_args.max_train_samples if data_args.max_train_samples is not None else len(splits['train'])
    #     )
    #     train_metrics["train_samples"] = min(max_train_samples, len(splits['train']))
    #     trainer.log_metrics("train", train_metrics)
    #     trainer.save_metrics("train", train_metrics)
    #     trainer.save_state()
    # if torch.cuda.is_available() and training_args.compute_memory:
    #     peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000
    #     print(
    #         "Memory utilization",
    #         peak_memory,
    #         "GB"
    #     )
    #     performance_metrics.update({"peak_memory": peak_memory})
    # if training_args.compute_memory or training_args.compute_time:
    #     print("Efficiency Statistics {}".format(performance_metrics))
    #     trainer.save_metrics("performance", performance_metrics)
    # Evaluation
    all_results = {}
    # all_results['evaluate'] = {}
    # if training_args.do_eval:
    #     logger.info("*** Evaluate ***")
    #     metrics = trainer.evaluate(eval_dataset=splits['eval'],
    #     )
    #     trainer.log_metrics(f"{data_args.task_name}_eval", metrics)
    #     trainer.save_metrics(f"{data_args.task_name}_eval", metrics)
    #     all_results['evaluate'][data_args.task_name] = metrics
    # Test
    all_results['test'] = {}
    if training_args.do_test:
        logger.info("*** Test ***")
        metrics = trainer.evaluate(eval_dataset=splits['test'],
        metric_key_prefix="test"
        )
        trainer.log_metrics(f"{data_args.task_name}_test", metrics)
        trainer.save_metrics(f"{data_args.task_name}_test", metrics)
        all_results['test'][data_args.task_name] = metrics
    # from opendelta.utils.delta_hub import create_hub_repo_name
    # from opendelta.utils.delta_center import create_delta_center_args, create_repo_name
    # repo_name = create_hub_repo_name(root="DeltaHub",
    #                      dataset=data_args.task_name,
    #                      delta_type = delta_args.delta_type,
    #                      model_name_or_path= model_args.model_name_or_path)
    # center_args =
    # repo_name = create_repo_name(prefix="", center_args=center_args)
    # all_results['repo_name'] = repo_name
    # delta_model.save_finetuned(push_to_hf=training_args.push_to_hf,
    #                            push_to_dc=training_args.push_to_dc,
    #                            center_args={},
    #                            center_args_pool = {**vars(model_args), **vars(data_args), **vars(training_args), **vars(delta_args)},
    #                            delay_push=True,
    #                         )
    print(all_results)
    # with open(f"{training_args.output_dir}/results.json", 'w') as fout:
    #     string = json.dumps(all_results, indent=4,sort_keys=True)
    #     fout.write(string+"\n")
    return all_results
 if __name__ == "__main__":
    result = main()
--- a/examples/examples_prompt/utils/args.py
+++ b/examples/examples_prompt/utils/args.py
@ -1,6 +1,10 @@
 from dataclasses import dataclass, field
 from typing import Optional, List
 from transformers import HfArgumentParser
 from pathlib import Path
 import sys
@dataclass
 class ModelArguments:
@ -81,6 +85,10 @@ class TrainingArguments(HfTrainingArguments):
    remove_unused_columns: Optional[bool] = field(
        default=False, metadata={"help": "Remove columns not required by the model when using an nlp.Dataset."}
    )
    push_to_hf: Optional[bool] = field(default=False, metadata={"help": "Push the model to huggingface model hub."})
    push_to_dc: Optional[bool] = field(default=True, metadata={"help": "Push the model to delta center."})
@ -211,28 +219,254 @@ class DataTrainingArguments:
            self.test_max_target_length = self.max_target_length
 import dataclasses
@dataclass
 class DeltaArguments:
    """
    Arguments pertaining to what data we are going to input our model for training and eval.
    """
    delta_type: str= field(default="", metadata={"help": "the type of delta"})
    backbone_model: Optional[str] = field(
        default="", metadata={"help": "the backbone model"}
    )
    model_path_public: Optional[str] = field(
        default="", metadata={"help": "the path (url) of the publicly available backbone model"}
    )
    modified_modules: Optional[List[str]] = field(
        default_factory=lambda: None, metadata={"help": "the modules inside the backbone to be modified"}
    )
    unfrozen_modules: Optional[List[str]] = field(
        default_factory=lambda:["deltas"], metadata={"help": "the modules inside the backbone or in the delta modules that need to be unfrozen"}
    )
    finetuned_delta_path: Optional[str] = field(
        default=None, metadata={"help": "the path of the finetuned delta model"}
    )
    force_download: Optional[bool] = field(
        default=False, metadata={"help": "whether to download the checkpoint form delta center no matter whether it exists"}
    )
    local_files_only: Optional[bool] = field(
        default=False, metadata={"help": "whether not to look for file in delta center"}
    )
    delta_cache_dir: Optional[str] = field(
        default=None, metadata={"help": "The cache path defined by user. If not set, we will firstly look into the"+
        " working directory and then into the default cache path (ususally ~/.cache/delta_center)."}
    )
    delay_push: Optional[bool] = field(
        default=True, metadata={
            'help':'whether push the checkpoint to delta center later.'
        }
    )
    def merge_arguments(self, objb):
        print(objb)
        self.__class__ = dataclasses.make_dataclass('DeltaArgument', fields=[(s.name, s.type, getattr(objb, s.name)) for s in dataclasses.fields(objb)], bases=(DeltaArguments,))
@dataclass
 class AdapterArguments:
    bottleneck_dim: Optional[int] = field(
        default=24, metadata={"help": "the dimension of the bottleneck layer"}
    )
@dataclass
 class LoRAArguments:
    lora_r: Optional[int] = field(
        default=8, metadata={"help": "the rank of the LoRA metrics."}
    )
@dataclass
 class PrefixArguments:
    pass
@dataclass
 class BitFitArguments:
    pass
@dataclass
 class SoftPromptArguments:
    soft_token_num: Optional[int] = field(
        default=100, metadata={"help": "the num of soft tokens."}
    )
@dataclass
 class CompacterArguments:
    pass
@dataclass
 class LowRankAdapterArguments:
    pass
 # from opendelta.delta_models.adapter import AdapterConfig
 # from opendelta.delta_models.bitfit import BitFitConfig
 # from opendelta.delta_models.compacter import CompacterConfig
 # from opendelta.delta_models.lora import LoraArguments
 # from opendelta.delta_models.low_rank_adapter import LowRankAdapterConfig
 # from opendelta.delta_models.prefix import PrefixConfig
 # from opendelta.delta_models.soft_prompt import SoftPromptConfig
 # DELTAARGMAP = {
 #     "adapter": AdapterConfig,
 #     "lora":LoraArguments,
 #     "prefix":PrefixConfig,
 #     "bitfit":BitFitConfig,
 #     "soft_prompt":SoftPromptConfig,
 #     "compacter":CompacterConfig,
 #     "low_rank_adapter":LowRankAdapterConfig
 # }
 DELTAARGMAP = {
    "adapter": AdapterArguments,
    "lora":LoRAArguments,
    "prefix":PrefixArguments,
    "bitfit":BitFitArguments,
    "soft_prompt":SoftPromptArguments,
    "compacter":CompacterArguments,
    "low_rank_adapter":LowRankAdapterArguments
 }
 # TODO: add more specific delta arguments
 class RemainArgHfArgumentParser(HfArgumentParser):
-    def parse_json_file(self, json_file: str, return_remaining_args=True ):
+    '''This is a more powerful version of argument parser.
    It can receiven both command line arguments and json file arguments.
    The command line arguments will override the json file arguments.
    The parser will load the specific delta arguments (e.g. Adapter's)
    according to the delta_type argument. And merge the specific delta arguments
    with the common delta arguments.
    '''
    def parse_json_file_with_cmd_args(self, json_file: str, command_line_args=None, return_remaining_args=True ):
        """
        Alternative helper method that does not use `argparse` at all, instead loading a json file and populating the
        dataclass types.
        """
-        import argparse
+
        import json
        from pathlib import Path
-        import dataclasses
+
        data = json.loads(Path(json_file).read_text())
        data_str = ""
        if command_line_args is None:
            command_line_args = []
        for key in data:
            if "--"+key not in command_line_args:
                if isinstance(data[key], list):
                    data_str += "--"+key
                    for elem in data[key]:
                        data_str+=" "+ str(elem)
                    data_str += " "
                else:
                    data_str+= "--" + key + " " + str(data[key]) + " "
        data_list = data_str.split()
        data_list += command_line_args
        if return_remaining_args:
            outputs, remain_args = self.parse_args_into_dataclasses(args=data_list, return_remaining_strings=return_remaining_args)
            for d in outputs:
                if isinstance(d, DeltaArguments): # merge the specific delta arguments
                    d.merge_arguments(outputs[-1])
            return  [*(outputs[:-1]), remain_args]
        else:
            outputs = self.parse_args_into_dataclasses(args=data_list, return_remaining_strings=return_remaining_args)
            for d in outputs:
                if isinstance(d, DeltaArguments):
                    d.merge_arguments(outputs[-1])
            return [*(outputs[:-1]),]
    def parse_args_into_dataclasses(
        self, args=None, return_remaining_strings=False, look_for_args_file=True, args_filename=None
    ):
        """
        Parse command-line args into instances of the specified dataclass types.
        This relies on argparse's `ArgumentParser.parse_known_args`. See the doc at:
        docs.python.org/3.7/library/argparse.html#argparse.ArgumentParser.parse_args
        Args:
            args:
                List of strings to parse. The default is taken from sys.argv. (same as argparse.ArgumentParser)
            return_remaining_strings:
                If true, also return a list of remaining argument strings.
            look_for_args_file:
                If true, will look for a ".args" file with the same base name as the entry point script for this
                process, and will append its potential content to the command line args.
            args_filename:
                If not None, will uses this file instead of the ".args" file specified in the previous argument.
        Returns:
            Tuple consisting of:
                - the dataclass instances in the same order as they were passed to the initializer.abspath
                - if applicable, an additional namespace for more (non-dataclass backed) arguments added to the parser
                  after initialization.
                - The potential list of remaining argument strings. (same as argparse.ArgumentParser.parse_known_args)
        """
        if args_filename or (look_for_args_file and len(sys.argv)):
            if args_filename:
                args_file = Path(args_filename)
            else:
                args_file = Path(sys.argv[0]).with_suffix(".args")
            if args_file.exists():
                fargs = args_file.read_text().split()
                args = fargs + args if args is not None else fargs + sys.argv[1:]
                # in case of duplicate arguments the first one has precedence
                # so we append rather than prepend.
        namespace, remaining_args = self.parse_known_args(args=args)
        # conditionally add delta arguments
        deltatype_args = DELTAARGMAP[namespace.delta_type]
        self.dataclass_types.append(deltatype_args)
        self._add_dataclass_arguments(deltatype_args)
        # parse the arguments again, this time with the specific delta type's arguments
        namespace, remaining_args = self.parse_known_args(args=args)
        outputs = []
        for dtype in self.dataclass_types:
            keys = {f.name for f in dataclasses.fields(dtype) if f.init}
-            inputs = {k: data.pop(k) for k in list(data.keys()) if k in keys}
+            inputs = {k: v for k, v in vars(namespace).items() if k in keys}
            for k in keys:
                delattr(namespace, k)
            obj = dtype(**inputs)
            outputs.append(obj)
-
+        if len(namespace.__dict__) > 0:
-        remain_args = argparse.ArgumentParser()
+            # additional namespace.
-        remain_args.__dict__.update(data)
+            outputs.append(namespace)
-        if return_remaining_args:
+        if return_remaining_strings:
-            return (*outputs, remain_args)
+            return (outputs, remaining_args)
        else:
-            return (*outputs,)
+            if remaining_args:
                raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
            return outputs
        # namespace, remaining_args = self.parse_known_args(args=data_list)
        # print("Here", command_line_args, data_list,namespace, remaining_args)
        # data.update(remain_args)
        # outputs = []
        # for dtype in self.dataclass_types:
        #     keys = {f.name for f in dataclasses.fields(dtype) if f.init}
        #     inputs = {k: namespace.get(k) for k in list(data.keys()) if k in keys}
        #     obj = dtype(**inputs)
        #     outputs.append(obj)
        # # remain_args = argparse.ArgumentParser()
        # remain_args.__dict__.update(remain_args)
        # if return_remaining_args:
        #     return (*outputs, remain_args)
        # else:
        #     return (*outputs,)
--- a/examples/examples_text-classification/configs/adapter_roberta-base/cola.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/cola.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "cola",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/cola",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "cola",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "cola",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/mnli.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/mnli.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "mnli",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/mnli",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "mnli",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "mnli",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/mrpc.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/mrpc.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "mrpc",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/mrpc",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "mrpc",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "mrpc",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/qnli.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/qnli.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "qnli",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/qnli",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "qnli",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "qnli",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/qqp.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/qqp.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "qqp",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/qqp",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "qqp",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "qqp",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/rte.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/rte.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "rte",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/rte",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": false,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "rte",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "rte",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/sst2.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/sst2.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "sst2",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/sst2",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "sst2",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "sst2",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/stsb.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/stsb.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "stsb",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 128,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/stsb",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "stsb",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "stsb",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-boolq.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-boolq.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-boolq",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/superglue-boolq",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-boolq",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-boolq",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-cb.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-cb.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-cb",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/superglue-cb",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-cb",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-cb",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-copa.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-copa.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-copa",
    "eval_steps": 50,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 40,
    "output_dir": "outputs/adapter/roberta-base/superglue-copa",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 50,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-copa",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-copa",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-multirc.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-multirc.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-multirc",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/superglue-multirc",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-multirc",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-multirc",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-record.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-record.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-record",
    "eval_steps": 200,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 512,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 3,
    "output_dir": "outputs/adapter/roberta-base/superglue-record",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 16,
    "per_device_train_batch_size": 16,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 200,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-record",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-record",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-wic.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-wic.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-wic",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/superglue-wic",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-wic",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-wic",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/examples_text-classification/configs/adapter_roberta-base/superglue-wsc.fixed.json
+++ b/examples/examples_text-classification/configs/adapter_roberta-base/superglue-wsc.fixed.json
@ -0,0 +1,46 @@
 {
    "bottleneck_dim": 24,
    "dataset_config_name": [
        "en"
    ],
    "delta_type": "adapter",
    "do_eval": true,
    "do_test": true,
    "do_train": true,
    "eval_dataset_config_name": [
        "en"
    ],
    "eval_dataset_name": "superglue-wsc.fixed",
    "eval_steps": 100,
    "evaluation_strategy": "steps",
    "greater_is_better": true,
    "learning_rate": 0.0003,
    "load_best_model_at_end": true,
    "max_source_length": 256,
    "metric_for_best_model": "eval_accuracy",
    "model_name_or_path": "roberta-base",
    "num_train_epochs": 20,
    "output_dir": "outputs/adapter/roberta-base/superglue-wsc.fixed",
    "overwrite_output_dir": true,
    "per_device_eval_batch_size": 32,
    "per_device_train_batch_size": 32,
    "predict_with_generate": true,
    "push_to_hub": true,
    "save_steps": 100,
    "save_strategy": "steps",
    "save_total_limit": 1,
    "seed": 42,
    "task_name": "superglue-wsc.fixed",
    "test_dataset_config_name": [
        "en"
    ],
    "test_dataset_name": "superglue-wsc.fixed",
    "tokenizer_name": "roberta-base",
    "unfrozen_modules": [
        "deltas",
        "layer_norm",
        "final_layer_norm",
        "classifier"
    ],
    "warmup_steps": 0
 }
--- a/examples/legacies/examples_seq2seq/README.md
+++ b/examples/legacies/examples_seq2seq/README.md
--- a/examples/legacies/examples_seq2seq/init.py
+++ b/examples/legacies/examples_seq2seq/init.py
--- a/examples/legacies/examples_seq2seq/configs/config_gen_bs.py
+++ b/examples/legacies/examples_seq2seq/configs/config_gen_bs.py
--- a/examples/legacies/examples_seq2seq/data_processors/init.py
+++ b/examples/legacies/examples_seq2seq/data_processors/init.py
--- a/examples/legacies/examples_seq2seq/data_processors/data_collator.py
+++ b/examples/legacies/examples_seq2seq/data_processors/data_collator.py
--- a/examples/legacies/examples_seq2seq/data_processors/postprocessors.py
+++ b/examples/legacies/examples_seq2seq/data_processors/postprocessors.py
--- a/examples/legacies/examples_seq2seq/data_processors/tasks.py
+++ b/examples/legacies/examples_seq2seq/data_processors/tasks.py
--- a/examples/legacies/examples_seq2seq/data_processors/utils.py
+++ b/examples/legacies/examples_seq2seq/data_processors/utils.py
--- a/examples/legacies/examples_seq2seq/metrics/init.py
+++ b/examples/legacies/examples_seq2seq/metrics/init.py
--- a/examples/legacies/examples_seq2seq/metrics/metrics.py
+++ b/examples/legacies/examples_seq2seq/metrics/metrics.py
--- a/examples/legacies/examples_seq2seq/metrics/qa_utils.py
+++ b/examples/legacies/examples_seq2seq/metrics/qa_utils.py
--- a/examples/legacies/examples_seq2seq/run_seq2seq.py
+++ b/examples/legacies/examples_seq2seq/run_seq2seq.py
--- a/examples/legacies/examples_seq2seq/seq2seq_trainer.py
+++ b/examples/legacies/examples_seq2seq/seq2seq_trainer.py
--- a/examples/legacies/examples_seq2seq/trainers/init.py
+++ b/examples/legacies/examples_seq2seq/trainers/init.py
--- a/examples/legacies/examples_seq2seq/trainers/model_args.py
+++ b/examples/legacies/examples_seq2seq/trainers/model_args.py
--- a/examples/legacies/examples_seq2seq/trainers/seq2seq_trainer.py
+++ b/examples/legacies/examples_seq2seq/trainers/seq2seq_trainer.py
--- a/examples/legacies/examples_seq2seq/trainers/trainer.py
+++ b/examples/legacies/examples_seq2seq/trainers/trainer.py
--- a/examples/legacies/examples_seq2seq/trainers/trainer_args.py
+++ b/examples/legacies/examples_seq2seq/trainers/trainer_args.py
--- a/examples/legacies/examples_seq2seq/trainers/trainer_utils.py
+++ b/examples/legacies/examples_seq2seq/trainers/trainer_utils.py
--- a/examples/legacies/examples_seq2seq/utils/init.py
+++ b/examples/legacies/examples_seq2seq/utils/init.py
--- a/examples/legacies/examples_seq2seq/utils/utils.py
+++ b/examples/legacies/examples_seq2seq/utils/utils.py
--- a/examples/legacies/examples_text-classification/README.md
+++ b/examples/legacies/examples_text-classification/README.md
--- a/examples/legacies/examples_text-classification/configs/config_gen.py
+++ b/examples/legacies/examples_text-classification/configs/config_gen.py
@ -161,6 +161,20 @@ AllConfigs['adapter_roberta-base'].update({
                                "output_dir": "outputs/adapter/roberta-base/",
                            })
 AllConfigs['parallel_adapter_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
 AllConfigs['parallel_adapter_roberta-base'].update({
                                "delta_type": "parallel_adapter",
                                "learning_rate": 3e-4,
                                "unfrozen_modules": [
                                    "deltas",
                                    "layer_norm",
                                    "final_layer_norm",
                                    "classifier",
                                ],
                                "bottleneck_dim":24,
                                "output_dir": "outputs/parallel_adapter/roberta-base/",
                            })
 AllConfigs['lora_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
 AllConfigs['lora_roberta-base'].update({
                                "delta_type": "lora",
--- a/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_cola.json
+++ b/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_cola.json
--- a/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_mnli.json
+++ b/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_mnli.json
--- a/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_mrpc.json
+++ b/examples/legacies/examples_text-classification/configs/lora_roberta-base/lora_mrpc.json
--- a/Show More
+++ b/Show More