first commit

This commit is contained in:
shengdinghu 2022-02-14 21:19:03 +08:00
commit b856ad0fb9
158 changed files with 13706 additions and 0 deletions

20
.gitignore vendored Normal file
View File

@ -0,0 +1,20 @@
data/
**/__pycache__/
logs/*
experiments/logs
!logs/.gitkeep
datasets/*
!datasets/*.sh
.vscode/
*.egg-info/
eggs/
.eggs/
*.egg
**.egg
build/
_build/
**/build/
outputs/
log.txt
**/DeltaHub/
*beans

29
.readthedocs.yaml Normal file
View File

@ -0,0 +1,29 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 1
# Set the version of Python and other tools you might need
build:
os: ubuntu-20.04
tools:
python: "3.9"
# You can also specify other tool versions:
# nodejs: "16"
# rust: "1.55"
# golang: "1.17"
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
# - pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt

94
README.md Normal file
View File

@ -0,0 +1,94 @@
<div align="center">
<img src="https://s4.ax1x.com/2022/02/14/Hy7lAf.png" width="350px">
**An Open-Source Framework for Paramter Efficient Tuning.**
------
<p align="center">
<a href="#Overview">Overview</a>
<a href="#installation">Installation</a>
<a href="#Supported-Models">Supported Models</a>
<a href="https://opendelta.readthedocs.io/">Docs</a>
<a href="https://docs.google.com/spreadsheets/d/1BIVa8ocAPga-u7rBOXLYaTfaJSjI1dWfwohmLjmFDrY/edit?usp=sharing">Performance</a>
</p>
</div>
![version](https://img.shields.io/badge/version-v0.1.0-blue)
## Overview
OpenDelta is a toolkit for parameter efficient methods (we dub it as *delta tuning*), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most paramters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs.
## Installation
create a virtualenv (optional)
```shell
conda create -n opendelta_env python=3.8
conda activate opendelta_env
```
### Using Pip
Our repo is tested on Python 3.6+ and PyTorch 1.8.1+, install OpenDelta using pip as follows:
```shell
pip install opendelta
```
To play with the latest features, you can also install OpenDelta from the source.
### Build from Source
```shell
git clone https://github.com/thunlp/OpenDelta.git
cd OpenDelta
```
#### Option 1: If you won't modify the code, run
```shell
python setup.py install
```
#### Option 2: If you want to modify the code, run
```shell
python setup.py develop
```
### Verified Supported Models
** You can try to use OpenDelta on any backbone models based on PyTorch.** However, with small chances that
The interface of the submodules of the backbone model is not supported. Therefore we verified some commonly
used models that OpenDelta are sure to support.
We will keep testing more and more emerging models.
Pull requests are welcomed when you successfully apply OpenDelta on your own backbone model.
| | Lora | Bias<br>Tuning | Adapter<br>Houstbly | Adapter<br>Preffier | Adapter<br>Drop | Adapater<br> Low-Rank | Compactor |Prefix<br> Tuning | Prompt <br> Tuning |
| --------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----- | ----- |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| T5-3b(parallel)| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Deberta-v2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| CTRL | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | |
| ViT | ✅ | | | | | | | | |
### Performance Checked Combination
Google sheet [here](https://docs.google.com/spreadsheets/d/1BIVa8ocAPga-u7rBOXLYaTfaJSjI1dWfwohmLjmFDrY/edit?usp=sharing)

20
docs/Makefile Normal file
View File

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

35
docs/make.bat Normal file
View File

@ -0,0 +1,35 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

20
docs/readme.md Normal file
View File

@ -0,0 +1,20 @@
# OpenDelta Documentation
To build this doc locally, please firstly install [sphinx](https://www.sphinx-doc.org/en/master/) packages.
```
pip install sphinx
pip install sphinx_rtd_theme
pip install sphinx_copybutton
pip install sphinx_toolbox
pip install myst_parser
```
Then install opendelta either from source, or from pip. After that,
```
cd docs
make html
```
Then open the generated `docs/build/html/index.html` in your local browser.

13
docs/requirements.txt Normal file
View File

@ -0,0 +1,13 @@
sphinx_copybutton
sphinx_rtd_theme
sphinx_toolbox
torch
transformers
sentencepiece==0.1.96
tqdm==4.62.2
openprompt
loralib
decorator
rich
myst_parser
web.py

View File

@ -0,0 +1,268 @@
/* a, */
.wy-menu-vertical header,
.wy-menu-vertical p.caption,
.wy-nav-top .fa-bars,
.wy-menu-vertical a:hover,
/* Colors and text decoration.
For example, :black:`text in black` or :blink:`text blinking` in rST. */
/* .black {
color: black;
}
.gray {
color: gray;
}
.grey {
color: gray;
}
.silver {
color: silver;
}
.white {
color: white;
}
.maroon {
color: maroon;
}
.red {
color: red;
}
.magenta {
color: magenta;
}
.fuchsia {
color: fuchsia;
}
.pink {
color: pink;
}
.orange {
color: rgba(218, 135, 12, 0.897);
} */
/* .string {
color: rgb(172, 51, 44);
} */
/* .yellow {
color: yellow;
}
.lime {
color: lime;
}
.green {
color: green;
}
.olive {
color: olive;
}
.teal {
color: teal;
}
.cyan {
color: cyan;
}
.aqua {
color: aqua;
}
.blue {
color: blue;
}
.navy {
color: navy;
}
.purple {
color: purple;
}
.under {
text-decoration: underline;
}
.over {
text-decoration: overline;
}
.blink {
text-decoration: blink;
}
.line {
text-decoration: line-through;
}
.strike {
text-decoration: line-through;
}
.it {
font-style: italic;
}
.ob {
font-style: oblique;
}
.small {
font-size: small;
}
.large {
font-size: large;
}
.smallpar {
font-size: small;
} */
a:link {
color: rgb(141, 99, 224)
}
a:visited {
color: rgb(141, 99, 224)
}
a:hover {
color: rgb(147, 47, 218)
}
.rst-content code.literal
{
color: rgb(172, 49, 42) !important;
/* #5360f0 */
}
.rst-content tt.literal
{
color: #f06b53 !important;
}
/* #a153f0 */
/* inspired by sphinx press theme */
.wy-menu.wy-menu-vertical li.toctree-l1.current > a {
border-left: solid 15px rgb(150, 92, 232) !important;
text-indent: -15px;
border-top: none;
border-bottom: none;
}
.wy-menu.wy-menu-vertical li.toctree-l1.current > ul {
border-left: solid 15px #ddcaf7 !important;
}
/* inspired by sphinx press theme */
.wy-nav-side {
color: unset !important;
background: unset !important;
border-right: solid 1px #ccc !important;
}
.wy-side-nav-search,
.wy-nav-top,
.wy-menu-vertical li,
.wy-menu-vertical li a:hover,
.wy-menu-vertical li a
{
background: unset !important;
}
.wy-menu-vertical li.current a {
border-right: unset !important;
}
.wy-side-nav-search div,
.wy-menu-vertical a {
color: #404040 !important;
}
.wy-menu-vertical button.toctree-expand {
color: #333 !important;
}
.wy-nav-content {
max-width: unset;
}
.rst-content {
max-width: 900px;
}
.wy-nav-content .icon-home:before {
content: "Docs";
}
.wy-side-nav-search .icon-home:before {
content: "";
}
dl.field-list {
display: block !important;
}
dl.field-list > dt:after {
content: "" !important;
}
dl.field-list > dt {
display: table;
padding-left: 6px !important;
padding-right: 6px !important;
margin-bottom: 4px !important;
padding-bottom: 1px !important;
background: rgb(252, 237, 208);
border-left: solid 2px rgb(231, 181, 134);
}
dl.py.class>dt
{
color: rgba(17, 16, 17, 0.822) !important;
background: rgb(247, 234, 252) !important;
border-top: solid 2px #b620d0 !important;
}
dl.py.method>dt
{
background: rgb(250, 239, 241) !important;
border-left: solid 2px rgb(199, 83, 106) !important;
}
dl.py.attribute>dt,
dl.py.property>dt
{
background: rgba(194, 233, 248, 0.1) !important;
border-left: solid 2px #58b5cc !important;
}
.fa-plus-square-o::before, .wy-menu-vertical li button.toctree-expand::before,
.fa-minus-square-o::before, .wy-menu-vertical li.current > a button.toctree-expand::before, .wy-menu-vertical li.on a button.toctree-expand::before
{
content: "";
}
.rst-content .viewcode-back,
.rst-content .viewcode-link
{
font-size: 120%;
}

View File

@ -0,0 +1,7 @@
document.addEventListener("DOMContentLoaded", function(event) {
document.querySelectorAll(".wy-menu.wy-menu-vertical > ul.current > li > a").forEach(a => a.addEventListener("click", e=>{
f = document.querySelector(".wy-menu.wy-menu-vertical > ul.current > li > ul")
if (f.style.display=='none') { f.style.display='block'; } else f.style.display = 'none'
}));
document.querySelectorAll(".headerlink").forEach(a => a.text="\u{1F517}");
});

144
docs/source/conf.py Normal file
View File

@ -0,0 +1,144 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import sys
sys.path.insert(0, "../../")
import datetime
import sphinx_rtd_theme
import doctest
import opendelta
import opendelta.delta_models
# -- Project information -----------------------------------------------------
project = 'OpenDelta'
author = 'THUNLP OpenDelta Team'
copyright = '{}, {}, Licenced under the Apache License, Version 2.0'.format(datetime.datetime.now().year, author)
# The full version, including alpha/beta/rc tags
release = '0.1.1'
version = "0.1.1"
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
doctest_default_flags = doctest.NORMALIZE_WHITESPACE
autodoc_member_order = 'bysource'
intersphinx_mapping = {'python': ('https://docs.python.org/', None),
"torch": ("https://pytorch.org/docs/stable/", None),}
html_show_sourcelink = True
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.mathjax',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.githubpages',
'sphinx_copybutton',
'sphinx_toolbox.collapse',
'myst_parser',
]
myst_enable_extensions = [
"html_image",
"colon_fence",
"html_admonition",
"amsmath",
"dollarmath",
]
source_suffix = {
'.rst': 'restructuredtext',
'.txt': 'markdown',
'.md': 'markdown',
}
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
# exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'alabaster'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_theme_options = {
# 'collapse_navigation': False,
# 'display_version': True,
#'logo_only': False,
'navigation_depth': 2,
}
html_static_path = ['_static']
html_css_files = ['css/custom.css']
html_js_files = ['js/custom.js']
rst_context = {'opendelta': opendelta}
# rst_epilog = "\n.. include:: .special.rst\n"
add_module_names = False
def include_only_tagged(app, what, name, obj, skip, options):
inclusion_tag_format = "[NODOC]" #can be any pattern here, choose what works for you
for tag in app.tags.tags:
if obj.__doc__ is not None and not obj.__doc__.startswith(inclusion_tag_format):
return False
return True
def skip2(app, what, name, obj, skip, options):
members = [
'__init__',
'__repr__',
'__weakref__',
'__dict__',
'__module__',
]
return True if name in members else skip
def skip(app, what, name, obj, skip, options):
skip = include_only_tagged(app, what, name, obj, skip, options) or\
skip2(app, what, name, obj, skip, options)
return skip
def setup(app):
def rst_jinja_render(app, docname, source):
src = source[0]
rendered = app.builder.templates.render_string(src, rst_context)
source[0] = rendered
app.connect('autodoc-skip-member', skip)
app.connect("source-read", rst_jinja_render)

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 306 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

BIN
docs/source/imgs/t5lora.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 372 KiB

54
docs/source/index.md Normal file
View File

@ -0,0 +1,54 @@
OpenDelta's documentation!
=====================================
OpenDelta is a **Plug-and-play** Library of the parameter-efficient fine-tuning ([delta-tuning](WhatisDelta)) technology for pre-trained models.
## Essential Advantages:
- <span style="color:rgb(81, 217, 245);font-weight:bold">Clean:</span> No need to edit the backbone PTMs codes.
- <span style="color:orange;font-weight:bold">Simple:</span> Migrating from full-model tuning to delta-tuning needs as little as 3 lines of codes.
- <span style="color:green;font-weight:bold">Sustainable:</span> Most evolution in external library doesnt require a new OpenDelta.
- <span style="color:red;font-weight:bold">Extendable:</span> Various PTMs can share the same delta-tuning codes.
- <span style="color:purple;font-weight:bold">Flexible:</span> Able to apply delta-tuning to (almost) any position of the PTMs.
```{eval-rst}
.. toctree::
:maxdepth: 1
:caption: Getting Started
notes/overview.md
notes/installation.md
notes/usage.md
notes/visualization.md
notes/saveload.md
.. toctree::
:maxdepth: 1
:caption: Advanced Usage
notes/keyfeature.md
notes/unifyname.md
notes/autodelta.md
notes/composition.md
notes/pluginunplug.md
notes/acceleration.md
notes/explored_config.md
notes/citation.md
.. toctree::
:maxdepth: 2
:caption: Package Reference
modules/base
modules/deltas
modules/auto_delta
modules/utils
Indices and tables
==================
* :ref:`genindex`
```

View File

@ -0,0 +1,14 @@
Auto Classes
======================================
AutoDeltaConfig
------------------------------------
.. autoclass:: opendelta.auto_delta.AutoDeltaConfig
:members:
AutoDeltaModel
------------------------------------
.. autoclass:: opendelta.auto_delta.AutoDeltaModel
:members:

View File

@ -0,0 +1,14 @@
Base Classes
======================================
BaseDeltaConfig
------------------------------------
.. autoclass:: opendelta.delta_configs.BaseDeltaConfig
:members:
DeltaBase
------------------------------------
.. autoclass:: opendelta.basemodel.DeltaBase
:members:

View File

@ -0,0 +1,46 @@
Delta Models
======================================
Lora
---------------------------------------
.. autoclass:: opendelta.LoraModel
:members:
BitFit
---------------------------------------
.. autoclass:: opendelta.BitFitModel
:members:
Adapter
---------------------------------------
.. autoclass:: opendelta.AdapterModel
:members:
LowRankAdapter
---------------------------------------
.. autoclass:: opendelta.LowRankAdapterModel
:members:
Compacter
---------------------------------------
.. autoclass:: opendelta.CompacterModel
:members:
Prefix tuning
------------------------------------
.. autoclass:: opendelta.PrefixModel
:members:
Soft Prompt Tuning
------------------------------------
.. autoclass:: opendelta.SoftPromptModel
:members:

View File

@ -0,0 +1,45 @@
# Utils
## SaveLoadMixin
```{eval-rst}
.. autoclass:: opendelta.utils.saving_loading_utils.SaveLoadMixin
:members:
```
## Visualization
```{eval-rst}
.. autoclass:: opendelta.utils.visualization.Visualization
:members:
```
## Structure Map
```{eval-rst}
.. autoclass:: opendelta.utils.structure_mapping.CommonStructureMap
:members:
```
## Utility Functions
### Hashing
```{eval-rst}
.. automodule:: opendelta.utils.model_md5
:members:
```
### Signature
```{eval-rst}
.. automodule:: opendelta.utils.signature
:members:
```
### Named-based addressing
```{eval-rst}
.. automodule:: opendelta.utils.name_based_addressing
:members:
```

View File

@ -0,0 +1,6 @@
(acceleration)=
# OpenDelta+
<img src="../imgs/todo-icon.jpeg" height="30px"> We are working on testing and improving the functionality with work with other acceleration packages for model training and inference. For example, [deepspeed](https://github.com/microsoft/DeepSpeed), [BMInf](https://github.com/OpenBMB/BMInf).
Feel free to contact us via email (shengdinghu@gmail.com) if you have any suggestion.

View File

@ -0,0 +1,67 @@
(autodelta)=
# AutoDelta Mechanism
Inspired by [Huggingface transformers AutoClasses](https://huggingface.co/docs/transformers/v4.16.2/en/model_doc/auto#transformers.AutoModel) , we provide an AutoDelta features for the users to
1. Easily to experiment with different delta models
2. Fast deploy from configuration file, especially from the repos in [DeltaHub](https://huggingface.co/DeltaHub).
## Easily load from dict, so that subject to change the type of delta models.
```python
from opendelta import AutoDeltaConfig, AutoDeltaModel
from transformers import T5ForConditionalGeneration
backbone_model = T5ForConditionalGeneration.from_pretrained("t5-base")
```
We can load a config from a dict
```python
config_dict = {
"delta_type":"lora",
"modified_modules":[
"SelfAttention.q",
"SelfAttention.v",
"SelfAttention.o"
],
"lora_r":4}
delta_config = AutoDeltaConfig.from_dict(config_dict)
```
Then use the config to add a delta model to the backbone model
```python
delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=backbone_model)
# now visualize the modified backbone_model
from opendelta import Visualization
Visualizaiton(backbone_model).structure_graph()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/t5lora.png
---
width: 600px
name: t5lora
---
```
````
## Fast deploy from a finetuned delta checkpoints from DeltaHub
```python
delta_model = AutoDeltaModel.from_finetuned("DeltaHub/sst2-t5-base", backbone_model=backbone_model) # TODO: the link may change.
```
<div class="admonition note">
<p class="title">**Hash checking**</p>
Since the delta model only works together with the backbone model.
we will automatically check whether you load the delta model the same way it is trained.
</p>
<p>
We calculate the trained model's [md5](http://some_link) and save it to the config. When finishing loading the delta model, we will re-calculate the md5 to see whether it changes.
<p>Pass `check_hash=False` to disable the hash checking.</p>
</div>

View File

@ -0,0 +1,3 @@
# Citation
<img src="../imgs/todo-icon.jpeg" height="30px"> We are working on a technical report.

View File

@ -0,0 +1,52 @@
(composition)=
# Composition of delta models
With OpenDelta, you can perform compostion of different delta models.
### Add different deltas to the backbone
```
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("roberta-base")
from opendelta import LoraModel, AdapterModel
delta_model = LoraModel(backbone_model=model, modified_modules=['key'], lora_r=1)
delta_model2 = AdapterModel(backbone_model=model, modified_modules=['output'], bottleneck_dim=12)
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/composition_of_delta.png
---
width: 600px
name: defaultmodification
---
```
````
### Even add multiple delta to the same layer
```
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
from opendelta import AdapterModel, LowRankAdapterModel
delta_model = AdapterModel(backbone_model=model, modified_modules=['fc2'])
delta_model2 = AdapterModel(backbone_model=model, modified_modules=['fc2'], bottleneck_dim=12)
delta_model3 = LowRankAdapterModel(backbone_model=model, modified_modules=['fc2'], reduction_factor=12)
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/multiple_to_one_layer.png
---
width: 600px
name: defaultmodification
---
```
````
:::{admonition} Order of Insertion
:class: warning
**When adding to the same layer, please pay attention to the order of adding delta.** As the above example, adapter is added after the `fc2`, the tensor will first go through `adapter` then go through `adapter_1`, at last `compacter`. If the delta is added before the backbone layer, then the last added delta will be the first to go through.
Also, pay attention to the detaching order. The delta that is first added should be the last to be detached.
:::

View File

@ -0,0 +1,11 @@
(favoredconfiguration)=
# Favored Configuration
<img src="../imgs/todo-icon.jpeg" height="30px"> We will add the commonly used configuration of delta models HERE in future.
E.g.
- the modified_modules (position of delta),
- hyperparameter that are the most efficient
- the favored composition between delta models
Currenlty, use the default setting, explore it by yourself, or refer to existing papers' configuration!

View File

@ -0,0 +1,24 @@
(installation)=
# Installation
OpenDelta is tested on on [Python 3.8](https://www.python.org/) and [Pytorch 1.9](<https://pytorch.org/>).
```bash
pip install opendelta
```
or from the source
```bash
git clone
cd OpenDelta
python setup.py install
```
If you want to do some modifications on the code for your research, run
```bash
git clone
cd OpenDelta
python setup.py develop
```

View File

@ -0,0 +1,200 @@
(keyfeature)=
# Philosophy and Key Features
:::{admonition} Plug-and-play Design.
:class: tip
Existing open-source project to propogate this **''delta-tuning''** paradigm includes
<a href="https://adapterhub.ml">AdapterHub</a>, which copies the transformers code base and modify on it, which makes it unintuitive to transfer from a normal code base to a delta-tuning ones.
OpenDelta approaches this problem via a **true plug-and-play** fashion to the PLMs. To migrate from a full-model finetuning training scripts to a delta tuning training scripts, you **DO NOT** need to change the backbone bone model code base to an adapted code base.
:::
Here is how we achieve it.
<img src="../imgs/pointing-right-finger.png" height="30px"> **Read through it will also help you to implement your own delta models in a sustainable way.**
(namebasedaddr)=
## 1. Name-based submodule addressing.
We locate the submodules that we want to apply a delta layer via name-based addressing.
In pytorch fashion, a submodule can be accessed from a root model via 'dot' addressing. For example, we define a toy language model
```python
import torch.nn as nn
class MyNet1(nn.Module):
def __init__(self,):
super().__init__()
self.name_a = nn.Linear(5,5)
def forward(self, hiddens):
return self.name_a(hiddens)
class MyNet2(nn.Module):
def __init__(self,):
super().__init__()
self.embedding = nn.Embedding(10,5)
self.name_b = nn.Sequential(MyNet1(), MyNet1())
def forward(self, input_ids):
hiddens = self.embedding(input_ids)
return self.name_b(hiddens)
root = MyNet2()
print(root.name_b[0].name_a)
# Linear(in_features=5, out_features=5, bias=True)
```
We can visualize the model (For details, see [visualization](visualization))
```python
from opendelta import Visualization
Visualization(root).structure_graph()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/name_based_addressing.png
---
width: 500px
name: name_based_addressing
---
```
````
In this case, string `"name_b.0.name_a"` will be the name to address the submodule from the root model.
Thus when applying a delta model to this toy net.
```
from opendelta import AdapterModel
AdapterModel(backbone_model=root, modified_modules=['name_b.0.name_a'])
Visualization(root).structure_graph()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/toy-delta.png
---
width: 500px
name: toy-delta
---
```
````
### Makes addressing easier.
Handcrafting the full names of submodules can be frustrating. We made some simplifications
1. End-matching Rules.
OpenDelta will take every modules that
**ends with** the provided name suffix as the modification [target module](target_module).
:::{admonition} Example
:class: tip
Taking DistilBert with an classifier on top as an example:
- set to `["0.attention.out_lin"]` will add delta modules to the attention output of distilbert's
ayer 0, i.e., `distilbert.transformer.layer.0.attention.out_lin`.
- set to `["attention.out_lin"]` will add the delta modules in every layer's `attention.out_lin`.
:::
2. Regular Expression.
<img src="../imgs/todo-icon.jpeg" height="30px"> Unit test and Doc later.
3. Interactive Selection.
We provide a way to interact visually to select modules needed.
```python
from transformers import BertForMaskedLM
model = BertForMaskedLM.from_pretrained("bert-base-cased")
# suppose we load BERT
from opendelta import LoraModel # use lora as an example, others are same
delta_model = LoraModel(backbone_model=model, interactive_modify=True)
```
by setting `interactive_modify`, a web server will be opened on local host, and the link will be print in the terminal.
```
http://0.0.0.0:8888/
```
If on your local machine, click to open the link for interactive modification.
If on remote host, you could use port mapping. For example, vscode terminal will automatically do port mapping for you, you can simply use `control/command + click` to open the link.
You can change the port number in case the default port number is occupied by other program by setting `interactive_modify=port_number`, in which port_number is an integer.
The web page looks like the following figure.
```{figure} ../imgs/interact.jpg
---
width: 500px
name: interact web page
---
```
- By clicking on `[+]`/`[-]` to expand / collapse tree nodes.
- By clicking on text to select tree nodes, **yellow dotted** box indicates the selection.
- **Double** click on the pink `[*]` is an advanced option to unfold the repeated nodes. By default, modules with the same architecture are folded into one node and are marked in red, for example, the `BertLayer` of layers 0~11 in the above figure are in the same structure. Regular model changes will make the same changes to each layers.
- If you want to change only a few of them, first double-click on `[*]`, then select the parts you want in the unfolded structure.
- If you want to make the same change to all but a few of them, first select the common parts you want in the folded structure, then double-click on `[*]` to remove the few positions you don't need to change in the expanded structure.
Click `submit` button on the top-right corner, then go back to your terminal, you can get a list of name-based addresses printed in the terminal in the following format, and these modules are being "delta".
```
modified_modules:
[bert.encoder.layer.0.output.dense, ..., bert.encoder.layer.11.output.dense]
```
## 2. Three basic submodule-level delta operations.
We use three key functions to achieve the modifications to the backbone model outside the backbone model's code.
1. **unfreeze some paramters**
Some delta models will unfreeze a part of the model parameters and freeze other parts of the model, e.g. [BitFit](https://arxiv.org/abs/2106.10199). For these methods, just use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method and pass the delta parts into `exclude`.
2. **replace an module**
Some delta models will replace a part of the model with a delta model, i.e., the hidden states will no longer go through the original submodules. This includes [Lora](https://arxiv.org/abs/2106.09685).
For these methods, we have an [update_module](opendelta.basemodel.DeltaBase.replace_module) interface.
3. **insertion to the backbone**
- **sequential insertion**
Most adapter model insert a new adapter layer after/before the original transformers blocks. For these methods, insert the adapter's forward function after/before the original layer's forward function using [insert_sequential_module](opendelta.basemodel.DeltaBase.insert_sequential_module) interface.
- **parallel insertion**
Adapters can also be used in a parallel fashion (see [Paper](https://arxiv.org/abs/2110.04366)).
For these methods, use [insert_parallel_module](opendelta.basemodel.DeltaBase.insert_parrellel_module) interface.
:::{admonition} Doc-preserving Insertion
:class: note
In the insertion operations, the replaced forward function will inherit the doc strings of the original functions.
:::
## 3. Pseudo input to initialize.
Some delta models, especially the ones that is newly introduced into the backbone, will need to determine the parameters' shape. To get the shape, we pass a pseudo input to the backbone model and determine the shape of each delta layer according to the need of smooth tensor flow.
:::{admonition} Pseudo Input
:class: warning
Most models in [Huggingface Transformers](https://huggingface.co/docs/transformers/index) have an attribute [dummy_inputs](https://github.com/huggingface/transformers/blob/v4.16.2/src/transformers/modeling_utils.py#L464). This will create a nonsensical input with the correct format to pass into the model's forward function.
For the models that doesn't inherit/implement this attributes, we assume the pseudo input to the model is something like `input_id`, i.e., an integer tensor.
```python
pseudo_input = torch.tensor([[0,0,0]])
# or
pseudo_input = torch.tensor([0,0,0])
```
<img src="../imgs/todo-icon.jpeg" height="30px"> We will add interface to allow more pseudo input in the future.
:::

View File

@ -0,0 +1,2 @@

View File

@ -0,0 +1,36 @@
# What is Delta-tuning and Why OpenDelta?
(WhatisDelta)=
:::{admonition} What is Delta?
:class: tip
As Pre-trained language models (PLMs) have become the fundamental infrastructure on many NLP tasks and benchmarks, it is becoming increasingly clear from recent research that **larger models tend to lead to better performance**. However, large-scale PLMs also bring prohibitive adaptation costs when fine-tuning all the parameters of a model and retaining separate instances for different tasks.
**Parameter-efficient model stimulation methods** thus have attracted researchers' eyes, which only tune a small fraction of model parameter while achieving comparable or even better performance than full-model fine-tuning, dubbed as "Delta-tuning".
**Delta** thus means a small fraction $\Delta\Theta$ of parameters besides the pretrained models $\Theta_0$.
\begin{gather*}
\Theta \sim \Theta_0\text{(frozen)} + \Delta\Theta\text{(tunable)}
\end{gather*}
This open-source project implement several delta-tuning methods, which allows researchers and engineers to quickly migrate their codes from full-model tuning to delta-tuning without replace the backend (the implementation of the backbone PLM).
:::
## Why OpenDelta?
- <span style="color:rgb(81, 217, 245);font-weight:bold">Clean:</span> No need to edit the backbone PTMs codes.
- <span style="color:orange;font-weight:bold">Simple:</span> Migrating from full-model tuning to delta-tuning needs as little as 3 lines of codes.
- <span style="color:green;font-weight:bold">Sustainable:</span> Most evolution in external library doesnt require a new OpenDelta.
- <span style="color:red;font-weight:bold">Extendable:</span> Various PTMs can share the same delta-tuning codes.
- <span style="color:purple;font-weight:bold">Flexible:</span> Able to apply delta-tuning to (almost) any position of the PTMs.
## Delta-tuning papers
<img src="../imgs/todo-icon.jpeg" height="30px">

View File

@ -0,0 +1,113 @@
# Multitask Modeling using OpenDelta
:::{admonition} Multitask Serving with Delta-tuning
:class: tip
A huge advange of Delta-tuning is that it can be used for multitask serving.
Imagine we have a pretrained model trained on a mix of data coming from multiple languages, e.g.,English, Chinese, and French. Now you want to have seperate models that specialise in Chinese, French, English. We can thus delta-tune three deltas on each language with small amount of additional language-specific data. During serving, when a Chinese sentence comes, you attach the "Chinese Delta", and next a French sentence comes, you detach the "Chinese Delta", and attach a "French Delta".
:::
**Here is how to achieve multitask serving using OpenDelta.**
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base")
from opendelta import LoraModel
delta_model = LoraModel(backbone_model=model, modified_modules=['fc2'])
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug1.png
---
width: 800px
name: defaultmodification
---
```
````
Now we detach the deltas from the backbone
```python
delta_model.detach()
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug2.png
---
width: 800px
name: defaultmodification
---
```
````
We can reattach the deltas to the backbone
```python
delta_model.attach()
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug3.png
---
width: 800px
name: defaultmodification
---
```
````
:::{admonition} Independence of Different Delta Models
:class: note
Different delta models will be independent in detaching and attaching.
(But the visualization will not show all deltas in the backbone model.)
```python
# continue from the above example
from opendelta import AdapterModel
delta_model2 = AdapterModel(backbone_model=model, modified_modules=['fc1'])
delta_model2.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug4.png
---
width: 800px
name: defaultmodification
---
```
````
detach the lora delta
```python
delta_model.detach() # detach the lora delta
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug5.png
---
width: 800px
name: defaultmodification
---
```
````
detach the adapter delta and reattach the lora delta
```python
delta_model2.detach() # detach the adapter delta
delta_model.attach() # reattach the lora delta
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/plugunplug6.png
---
width: 800px
name: defaultmodification
---
```
````
:::
:::{admonition} BitFit not supported
:class: warning
<img src="../imgs/todo-icon.jpeg" height="30px"> Currently detach is not suitable for BitFit, which modify the requires_grad property. Please wait for future releases.
:::

View File

@ -0,0 +1,98 @@
(saveload)=
# Save and Share the Delta
## Space efficient saving without changing the code.
After a modified backbone model is trained, you can save only trained part without change to any code, because **the state dict of the backbone model has been changed to the trainable parts**
```python
from opendelta import CompacterModel
from transformers import BertForMaskedLM
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
delta_model = CompacterModel(backbone_model) # modify the default modules.
# freeze module
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
# or
delta_model.freeze_module(exclude=["deltas"])
```
### save the checkpoint.
now save the backbone_model in normal way, and the checkpoint is **very space efficient**.
```python
# ...
# After some training pipeline
# ...
torch.save(backbone_model.state_dict(), "delta.ckpt")
# the checkpoint size
import os
print("checkpoint size: {:.2f}M".format(os.path.getsize("delta.ckpt")/1024**2))
# checkpoint size: 0.32M
```
### load the checkpoint.
In order to load the checkpoint, you should make sure the backbone model is a modified ones (so that it can take in the delta parameters).
Then load the checkpoint with `strict=False`.
```python
backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
# this will return long string of warning about the 'missing key'.
# if you want to supress it, use
# _ = backbone_model.load_state_dict(torch.load("delta.ckpt"), strict=False)
```
## Save/Load the entire model after training.
### save a delta model.
```python
delta_model.save_finetuned("delta_model")
# Configuration saved in delta_model/config.json
# Model weights saved in delta_model/pytorch_model.bin
```
This will save all the trained parameters and the configuration of the delta model to path `delta_model/`
### load a delta model.
```python
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
delta_model.from_finetuned("delta_model", backbone_model, local_files_only=True)
# passing local_files_only=True will save the time of checking in the web.
```
## Share or download a model to/from the community.
### Share.
```python
delta_model.save_finetuned("test_delta_model", push_to_hub = True)
```
### Download from community.
```python
from transformers import AutoModelForSeq2SeqLM
t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
from opendelta import AutoDeltaModel
delta = AutoDeltaModel.from_finetuned("DeltaHub/lora_t5-base_mrpc", backbone_model=t5)
delta.log()
```
<div class="admonition tip">
<p class="title">**Push to Hub**</p>
<p> Currently we only provide the option to push to huggingface model hub.</p>
<p> Before push to hub, you may need to register an account on Huggingface. You can refer to this [tutorial about model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing)
</p>
<p> In some cases, your checkpoint is still large for git, please install [`git-lfs`](https://git-lfs.github.com).
</p>
</div>
:::{admonition} **Sharing with the Community**
:class: tip
If you are satisfied with your checkpoint, do not forget to share your model to <a href="https://huggingface.co/DeltaHub">DeltaHub</a>:
1. Add yourself to DeltaHub with the [public link](https://huggingface.co/organizations/DeltaHub/share/QzkBuLSmlVnNhQqHYnekoTXwSRkoRHBwZA)
2. Be sure to edit your model card to clearly illustrate the delta model before you share.
3. Click `setting` on the model
4. Transfer the model in `rename or transfer this model` section.
:::
## Save & Load for Composition of Delta
<img src="../imgs/todo-icon.jpeg" height="30px"> Currently save & load method is not suitable for [composition of delta model](compositon). Please wait for future releases.

View File

@ -0,0 +1,82 @@
(unifyname)=
# Unified Name Convention
```{figure} ../imgs/transformers_structure.png
:width: 400px
:name: transformers_structure
```
Although different PTMs often share similar Transformers structures, the codebases, and most importantly, the variable names for each submodule, are quite different.
On the one hand, we **encourage the users to first [visualize](visualization) the PTMs' structure and then determine the name of submoduels.**
On the other hand, we designed a unified name convention of Transformer Structure, and provided several structure mapping from the original name to the unified name convention.
In this section, we will illustrate the unified name convention and structure mapping.
## Common blocks in Transformers structure.
- embeddings (word embedding)
- encoder
- block
- $ (layer_id)
- attn
- q, k, v
- proj
- layer_norm
- ff
- w1
- w2
- layer_norm
- decoder (similar to encoder)
- lm_head
- proj
Visualize bert-base using a common structure name: The submodules that are not common are grey.
```{figure} ../imgs/commonstructure_vis.png
:width: 600px
:name: transformers_structure
```
(commonstructure)=
## Mappings
Example of bert mapping: a tree with node names specified by <span style="font-weight:bold;color:rgb(55, 125, 34);" >"\_\_name\_\_"</span>
```json
{
"bert.embeddings.word_embeddings": {"__name__":"embeddings"},
"bert.embeddings.position_embeddings": {"__name__":""},
"bert.embeddings.token_type_embeddings": {"__name__":""},
"bert.embeddings.LayerNorm": {"__name__":""},
"bert.encoder": {"__name__":"encoder",
"layer": {"__name__":"block",
"$": {"__name__":"$",
"attention": {"__name__":"attn",
"self.query": {"__name__":"q"},
"self.key": {"__name__":"k"},
"self.value": {"__name__":"v"},
"output.dense": {"__name__":"proj"},
"output.LayerNorm": {"__name__":"layer_norm"},
},
"output": {"__name__":"ff",
"dense": {"__name__":"w2"},
"LayerNorm": {"__name__":"layer_norm"}
},
"intermediate.dense": {"__name__":"ff.w1"},
}
}
},
"cls.predictions": {"__name__": "lm_head",
"transform.dense": {"__name__":""},
"transform.LayerNorm": {"__name__":""},
"decoder": {"__name__":"proj"},
}
}
```

137
docs/source/notes/usage.md Normal file
View File

@ -0,0 +1,137 @@
(basics)=
# Basic Usage
Now we introduce the general pipeline to migrate your full-model tuning scripts to a delta tuning one.
## STEP 1: Load the pretrained models
```python
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-base") # suppose we load BART
```
## STEP 2: Add delta modules
We provide two alternatives to add the delta modules.
### 2.1 Modification based on visualization
Suppose we want to make the feedforward layer of each block as our [modification target module](target_module),
We should first know what is the name of the feedforward layer in the BART model by visualization. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For more about visualization, see [Visualization](visualization).*
```python
from opendelta import Visualization
Visualization(model).structure_graph()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/bart-base.png
---
width: 600px
name: bart-base
---
```
````
We can see from the structure graph that the feed forward layer in Bart is called `model.encoder.layers.$.fc1` and `model.encoder.layers.$.fc2`, where
`$` represent a number from 0-5. Since we want to apply adapter after *all* the feed forward layers, we specify the `modified_modules=['fc2']`, which is the common suffix for feed forward layers.
<img src="../imgs/hint-icon-2.jpg" height="30px"> *For details about the name based addressing, see [Name-based submodule addressing](namebasedaddr)*
Other configurations, such as the `bottleneck_dim` in Adapter, can be passed as key word arguments.
```python
from opendelta import AdapterModel
delta_model = AdapterModel(backbone_model=model, modified_modules=['fc2'], bottleneck_dim=12)
delta_model.log() # This will visualize the backbone after modification and other information.
```
(target_module)=
:::{admonition} Target module
:class: note
For different delta methods, the operation for the modification target is different.
- Adapter based method: Insert at the target module's forward function.
- BitFit: Add bias to all allowed position of the target module.
- Lora: Substitute the all the linear layers of the target module with [Lora.Linear](https://github.com/microsoft/LoRA/blob/main/loralib/layers.py#L92).
:::
### 2.2 Use the default modification.
We also provide the default modifications of each delta methods for some commonly used PTMs (e.g., BERT, RoBERTA, DistilBERT, T5, GPT2), so the users don't need to specify the submodules to modify.
The default modifications is achieved by a [common_structure mapping](commonstructure), that is, use the mapping a name of a module to the it's name on a common transformer structure. <img src="../imgs/hint-icon-2.jpg" height="30px"> *For details about the default modification, see [Unified Name Convention](unifyname)*
```python
# a seperate example using BERT.
from transformers import BertForMaskedLM
from opendelta import AdapterModel
model = BertForMaskedLM.from_pretrained("bert-base-cased")
delta_model = AdapterModel(model) # This will apply adapter to the self-attn and feed-forward layer.
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/defaultmodification.png
---
width: 600px
name: defaultmodification
---
```
````
:::{admonition} Delta model vs Backbone model
:class: note
The delta_model **CAN NOT** be used alone, and its [forward](opendelta.basemodel.DeltaBase.forward) is canceled. The training pipeline should be conducted on the backbone model (In the above example, its the `model`).
:::
:::{admonition} Try different positions
:class: tip
OpenDelta provide the flexibility to add delta to different positions on the backbone model. For example, If you want to move the adapter in the above example after the layer norm of the feed forward layer. The code should be changed into
```python
# continue with the BART example, but not used later.
delta_model = AdapterModel(backbone_model=model, modified_modules=['final_layer_norm'], bottleneck_dim=12)
```
The performance may vary due to positional differences, but there is no academic guarantee that one will outperform the other.
:::
:::{admonition} Favored Configurations
:class: tip
Feel confused about the flexibility that OpenDelta brings? NO WORRY! We will add [Favored Configurations](favoredconfiguration) soon.
:::
## STEP 3: Freezing parameters
The main part of the backbone model is not automatically frozen (We may add the option in future). To freeze the main part of the backbone model except the trainable parts (usually the delta paramters), use [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method. The `exclude` field obeys the same name-based addressing rules as the `modified_modules` field.
```python
# continue with the BART example
delta_model.freeze_module(exclude=["deltas", "layernorm_embedding"], set_state_dict=True)
delta_model.log()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/afterfreeze.png
---
width: 600px
name: afterfreeze
---
```
````
The `set_state_dict=True` will tell the method to change the `state_dict` of the `backbone_model` to maintaining only the trainable parts.
## STEP 4: Normal training pipeline
The **model** then can be trained in traditional training scripts. Two things should be noticed:
:::{admonition} Note
:class: note
1. No need to change the optimizer, since the optimizer will only calculated and store gradient for those parameters with `requires_grad=True`, and the `requires_grad` attribute has been changed during the call to [freeze_module](opendelta.basemodel.DeltaBase.freeze_module) method.
2. `model.eval()` or `model.train()` should be used when needed to set dropout, etc. Delta model doesn't touch those configuration.
:::
## STEP 5: Saved/Share the Delta Model
<img src="../imgs/hint-icon-2.jpg" height="30px"> *see [Save a delta model to local, or share with the community](saveload).*

View File

@ -0,0 +1,125 @@
(visualization)=
# Visualize the Parameters
When OpenDelta makes modifications to a pretrained model (PTM), it is beneficial to know what your PTM looks like, especially the location of the parameters.
- **Before** applying opendelta, you can know **how to specify your modifications in terms of key addressing**.
- **After** the modification is done, you can know **if your modification is what you expected**, for example, whether the position of the delta
modules are desired, or whether you froze the correct parameters.
Now let's begin to try the visualization utility.
## Visualization is NOT easy using pytorch native function.
```python
from transformers import BertForMaskedLM
backbone_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
print(backbone_model)
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/raw_print.png
---
width: 600px
name: raw_print
---
```
````
The original presentation of models is **not tailored for repeated structures, big models, or parameters-centric tasks**.
## Using visualization from opendelta.
First let's visualize all the parameters in the bert model. As we can see, structure inside a bert model, and the all the paramters location of the model are neatly represented in tree structure. (See [color scheme](color_schema) for the colors)
```python
from opendelta import Visualization
model_vis = Visualization(backbone_model)
model_vis.structure_graph()
```
<!-- ````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span> -->
```{figure} ../imgs/bert_vis.png
---
width: 600px
name: bert_vis
---
```
<!-- ```` -->
<div class="admonition note">
<p class="title">**Suggestion**</p>
We can reference a module according to the graph easily:
```python
print(backbone_model.bert.encoder.layer[0].intermdiate)
```
When using opendelta on a new backbone model, it's better to first visualize the child module names (shown in white), and then designating the `modified_modules`.
</div>
## Now add a delta model and visualize the change.
```python
from opendelta import LowRankAdapterModel
delta_model = LowRankAdapterModel(backbone_model)
delta_model.freeze_module(exclude=["cls", "intermediate", "LayerNorm"])
Visualization(backbone_model).structure_graph()
```
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/bertdelta_vis.png
---
width: 600px
name: bertdelta_vis
---
```
````
(color_schema)=
<div class="admonition tip">
<div class="title">**Color Schema**</div>
<ul>
<li> The <span style="font-weight:bold;color:white;">white</span> part is the name of the module.</li>
<li> The <span style="font-weight:bold;color:green;">green</span> part is the module's type.</li>
<li> The <span style="font-weight:bold;color:blue;">blue</span> part is the tunable parameters, i.e., the parameters that require grad computation.</li>
<li> The <span style="font-weight:bold;color:grey;">grey</span> part is the frozen parameters, i.e., the parameters that do not require grad computation.</li>
<li> The <span style="font-weight:bold;color:red;">red</span> part is the structure that is repeated and thus folded.</li>
<li> The <span style="font-weight:bold;color:purple;">purple</span> part is the delta parameters inserted into the backbone model.</li>
</ul>
</div>
:::{admonition} PlatForm Sentivity
:class: warning
Depending on the platform the code is running on, the colors may vary slightly.
:::
## We also provide the option to visualize the nodes without parameters.
```python
Visualization(backbone_model).structure_graph(keep_non_params=True)
```
Thus, the modules like dropout and activations are kept.
````{collapse} <span style="color:rgb(141, 99, 224);font-weight:bold;font-style:italic">Click to view output</span>
```{figure} ../imgs/bertdelta_noparam.png
---
width: 600px
name: bertdelta_noparam
---
```
````
:::{admonition} Order of the submodule
:class: warning
Currently, OpenDeltas Visualization visualize the model based on pytorch's named_modules method. That means the order of the presented submodule is the order they are add to the parent module, not necessarily the order that tensors flows through.
:::

25
examples/README.md Normal file
View File

@ -0,0 +1,25 @@
# Use Examples
This repo mainly contains several running scripts to use OpenDelta to conduct parameter-efficient training of various tasks.
**Note that we suggest adding OpenDelta to existing scripts, instead of modify a scripts into the following examples. OpenDelta itself doens't restrict the training pipeline nor provide pipeline.**
## tutorial
Several toy tutorials:
1. The scripts for docs/basic_usage
2. Using interactive module selection
3. Work with [OpenPrompt](https://github.com/thunlp/OpenPrompt)
## examples_text-classification
Modify a huggingface text-classification examples into a delta tuning one.
Currently, GLUE datasets are supported in the scripts. Roberta-base is used for performance checking. Read README.md inside the repo for detailed usage.
## examples_seq2seq
Modify a huggingface sequence to sequence examples into a delta tuning one.
Currently, SuperGLUE and GLUE datasets are supported in the scripts. T5-base is used for performance checking. Read README.md inside the repo for detailed usage.
## examples_image-classification
A toy example of using OpenDelta for a Computer Vision Pretrained Model (ViT). Since ViT is an experimental feature in huggingface transformers, this example is subject to Change at any moment.

View File

@ -0,0 +1,166 @@
<!---
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Use OpenDelta in vision transformer ViT
This example uses the [huggingface image classification examples](), by adding several
lines in the original scripts.
## Usage
### 1. install necessary package
```shell
pip install Pillow
pip install torchvision
pip install transformers==4.16.2
pip install datsets==1.18.0
```
### 2. upgrade the transformers to 4.10.0
### 3. run
```bash
python run_image_classification.py configs/lora_beans.json
```
Do not forget to re-install datasets back into 1.17.0 for other examples. :)
## Possible Errors
1. dataset connection error
Solution 1: open a python console, running the error command again, may not be useful
Solution 2: download the dataset by yourself on a internect connected machine, saved to disk and transfer to your server, at last load_from_disk.
# Image classification examples
The following examples showcase how to fine-tune a `ViT` for image-classification using PyTorch.
## Using datasets from 🤗 `datasets`
Here we show how to fine-tune a `ViT` on the [beans](https://huggingface.co/datasets/beans) dataset.
👀 See the results here: [nateraw/vit-base-beans](https://huggingface.co/nateraw/vit-base-beans).
```bash
python run_image_classification.py \
--dataset_name beans \
--output_dir ./beans_outputs/ \
--remove_unused_columns False \
--do_train \
--do_eval \
--push_to_hub \
--push_to_hub_model_id vit-base-beans \
--learning_rate 2e-5 \
--num_train_epochs 5 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--save_strategy epoch \
--load_best_model_at_end True \
--save_total_limit 3 \
--seed 1337
```
Here we show how to fine-tune a `ViT` on the [cats_vs_dogs](https://huggingface.co/datasets/cats_vs_dogs) dataset.
👀 See the results here: [nateraw/vit-base-cats-vs-dogs](https://huggingface.co/nateraw/vit-base-cats-vs-dogs).
```bash
python run_image_classification.py \
--dataset_name cats_vs_dogs \
--output_dir ./cats_vs_dogs_outputs/ \
--remove_unused_columns False \
--do_train \
--do_eval \
--push_to_hub \
--push_to_hub_model_id vit-base-cats-vs-dogs \
--fp16 True \
--learning_rate 2e-4 \
--num_train_epochs 5 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--save_strategy epoch \
--load_best_model_at_end True \
--save_total_limit 3 \
--seed 1337
```
## Using your own data
To use your own dataset, the training script expects the following directory structure:
```bash
root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
```
Once you've prepared your dataset, you can can run the script like this:
```bash
python run_image_classification.py \
--dataset_name nateraw/image-folder \
--train_dir <path-to-train-root> \
--output_dir ./outputs/ \
--remove_unused_columns False \
--do_train \
--do_eval
```
### 💡 The above will split the train dir into training and evaluation sets
- To control the split amount, use the `--train_val_split` flag.
- To provide your own validation split in its own directory, you can pass the `--validation_dir <path-to-val-root>` flag.
## Sharing your model on 🤗 Hub
0. If you haven't already, [sign up](https://huggingface.co/join) for a 🤗 account
1. Make sure you have `git-lfs` installed and git set up.
```bash
$ apt install git-lfs
$ git config --global user.email "you@example.com"
$ git config --global user.name "Your Name"
```
2. Log in with your HuggingFace account credentials using `huggingface-cli`
```bash
$ huggingface-cli login
# ...follow the prompts
```
3. When running the script, pass the following arguments:
```bash
python run_image_classification.py \
--push_to_hub \
--push_to_hub_model_id <name-your-model> \
...
```

View File

@ -0,0 +1,30 @@
{
"report_to": "none",
"dataset_name": "beans",
"output_dir": "./beans_outputs/",
"do_train": true,
"do_eval": true,
"num_train_epochs": 5,
"remove_unused_columns": false,
"per_device_train_batch_size": 8,
"per_device_eval_batch_size": 8,
"logging_strategy": "steps",
"logging_steps": 10,
"evaluation_strategy": "epoch",
"save_strategy": "epoch",
"load_best_model_at_end": true,
"save_total_limit": 3,
"seed": 1337,
"delta_type": "lora",
"modified_modules": [
"attention.query",
"attention.value"
],
"unfrozen_modules": [
"classifier",
"deltas"
],
"overwrite_output_dir": true,
"learning_rate": 5e-4
}

View File

@ -0,0 +1,89 @@
# coding=utf-8
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Accuracy metric."""
from sklearn.metrics import accuracy_score
import datasets
_DESCRIPTION = """
Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
TP: True positive
TN: True negative
FP: False positive
FN: False negative
"""
_KWARGS_DESCRIPTION = """
Args:
predictions: Predicted labels, as returned by a model.
references: Ground truth labels.
normalize: If False, return the number of correctly classified samples.
Otherwise, return the fraction of correctly classified samples.
sample_weight: Sample weights.
Returns:
accuracy: Accuracy score.
Examples:
>>> accuracy_metric = datasets.load_metric("accuracy")
>>> results = accuracy_metric.compute(references=[0, 1], predictions=[0, 1])
>>> print(results)
{'accuracy': 1.0}
"""
_CITATION = """\
@article{scikit-learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}
"""
@datasets.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
class Accuracy(datasets.Metric):
def _info(self):
return datasets.MetricInfo(
description=_DESCRIPTION,
citation=_CITATION,
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
"predictions": datasets.Sequence(datasets.Value("int32")),
"references": datasets.Sequence(datasets.Value("int32")),
}
if self.config_name == "multilabel"
else {
"predictions": datasets.Value("int32"),
"references": datasets.Value("int32"),
}
),
reference_urls=["https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html"],
)
def _compute(self, predictions, references, normalize=True, sample_weight=None):
return {
"accuracy": float(
accuracy_score(references, predictions, normalize=normalize, sample_weight=sample_weight)
)
}

View File

@ -0,0 +1,3 @@
# torch>=1.5.0
torchvision>=0.6.0
datasets>=1.8.0

View File

@ -0,0 +1,392 @@
#!/usr/bin/env python
# coding=utf-8
# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
import logging
import os
import sys
from dataclasses import dataclass, field
from typing import Optional
import datasets
import numpy as np
import torch
from datasets import load_dataset
from PIL import Image
from torchvision.transforms import (
CenterCrop,
Compose,
Normalize,
RandomHorizontalFlip,
RandomResizedCrop,
Resize,
ToTensor,
)
import transformers
from transformers import (
MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING,
AutoConfig,
AutoFeatureExtractor,
AutoModelForImageClassification,
HfArgumentParser,
Trainer,
TrainingArguments,
)
from transformers.trainer_utils import get_last_checkpoint
from transformers.utils import check_min_version
from transformers.utils.versions import require_version
""" Fine-tuning a 🤗 Transformers model for image classification"""
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.16.0.dev0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/image-classification/requirements.txt")
MODEL_CONFIG_CLASSES = list(MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
def pil_loader(path: str):
with open(path, "rb") as f:
im = Image.open(f)
return im.convert("RGB")
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
Using ``HfArgumentParser`` we can turn this class
into argparse arguments to be able to specify them on
the command line.
"""
dataset_name: Optional[str] = field(
default="nateraw/image-folder", metadata={"help": "Name of a dataset from the datasets package"}
)
dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
)
train_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the training data."})
validation_dir: Optional[str] = field(default=None, metadata={"help": "A folder containing the validation data."})
train_val_split: Optional[float] = field(
default=0.15, metadata={"help": "Percent to split off of train for validation."}
)
max_train_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
},
)
max_eval_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of evaluation examples to this "
"value if set."
},
)
def __post_init__(self):
data_files = dict()
if self.train_dir is not None:
data_files["train"] = self.train_dir
if self.validation_dir is not None:
data_files["val"] = self.validation_dir
self.data_files = data_files if data_files else None
class RemainArgHfArgumentParser(HfArgumentParser):
def parse_json_file(self, json_file: str, return_remaining_args=True ):
"""
Alternative helper method that does not use `argparse` at all, instead loading a json file and populating the
dataclass types.
"""
import argparse
import json
from pathlib import Path
import dataclasses
data = json.loads(Path(json_file).read_text())
outputs = []
for dtype in self.dataclass_types:
keys = {f.name for f in dataclasses.fields(dtype) if f.init}
inputs = {k: data.pop(k) for k in list(data.keys()) if k in keys}
obj = dtype(**inputs)
outputs.append(obj)
remain_args = argparse.ArgumentParser()
remain_args.__dict__.update(data)
if return_remaining_args:
return (*outputs, remain_args)
else:
return (*outputs,)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
"""
model_name_or_path: str = field(
default="google/vit-base-patch16-224-in21k",
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"},
)
model_type: Optional[str] = field(
default=None,
metadata={"help": "If training from scratch, pass a model type from the list: " + ", ".join(MODEL_TYPES)},
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"}
)
model_revision: str = field(
default="main",
metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
)
feature_extractor_name: str = field(default=None, metadata={"help": "Name or path of preprocessor config."})
use_auth_token: bool = field(
default=False,
metadata={
"help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
"with private models)."
},
)
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
labels = torch.tensor([example["labels"] for example in examples])
return {"pixel_values": pixel_values, "labels": labels}
def main():
# See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns.
parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args, delta_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args, delta_args = parser.parse_args_into_dataclasses()
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
logger.info(f"Training/evaluation parameters {training_args}")
# Detecting last checkpoint.
last_checkpoint = None
if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
last_checkpoint = get_last_checkpoint(training_args.output_dir)
if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
elif last_checkpoint is not None and training_args.resume_from_checkpoint is None:
logger.info(
f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
# Initialize our dataset and prepare it for the 'image-classification' task.
ds = load_dataset(
data_args.dataset_name,
data_args.dataset_config_name,
data_files=data_args.data_files,
cache_dir=model_args.cache_dir,
task="image-classification",
)
# If you encounter error here, try to down load the dataset by yourself and load from disk
# like the following two lines
# from datasets import load_from_disk
# ds = load_from_disk(f"../../../../huggingface_datasets/saved_to_disk/{data_args.dataset_name}")
# If we don't have a validation split, split off a percentage of train as validation.
data_args.train_val_split = None if "validation" in ds.keys() else data_args.train_val_split
if isinstance(data_args.train_val_split, float) and data_args.train_val_split > 0.0:
split = ds["train"].train_test_split(data_args.train_val_split)
ds["train"] = split["train"]
ds["validation"] = split["test"]
# Prepare label mappings.
# We'll include these in the model's config to get human readable labels in the Inference API.
labels = ds["train"].features["labels"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = str(i)
id2label[str(i)] = label
# Load the accuracy metric from the datasets package
# metric = datasets.load_metric("accuracy")
metric = datasets.load_metric("metric.py")
# Define our compute_metrics function. It takes an ``EvalPrediction`` object (a namedtuple with a
# predictions and label_ids field) and has to return a dictionary string to float.
def compute_metrics(p):
"""Computes accuracy on a batch of predictions"""
return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)
config = AutoConfig.from_pretrained(
model_args.config_name or model_args.model_name_or_path,
num_labels=len(labels),
label2id=label2id,
id2label=id2label,
finetuning_task="image-classification",
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model = AutoModelForImageClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
feature_extractor = AutoFeatureExtractor.from_pretrained(
model_args.feature_extractor_name or model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
if delta_args.delta_type.lower() != "none":
from opendelta import AutoDeltaConfig,AutoDeltaModel
delta_config = AutoDeltaConfig.from_dict(vars(delta_args))
delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=model)
delta_model.freeze_module(set_state_dict = True)
delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
# Define torchvision transforms to be applied to each image.
normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
_train_transforms = Compose(
[
RandomResizedCrop(feature_extractor.size),
RandomHorizontalFlip(),
ToTensor(),
normalize,
]
)
_val_transforms = Compose(
[
Resize(feature_extractor.size),
CenterCrop(feature_extractor.size),
ToTensor(),
normalize,
]
)
def train_transforms(example_batch):
"""Apply _train_transforms across a batch."""
example_batch["pixel_values"] = [
_train_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]
]
return example_batch
def val_transforms(example_batch):
"""Apply _val_transforms across a batch."""
example_batch["pixel_values"] = [_val_transforms(pil_img.convert("RGB")) for pil_img in example_batch["image"]]
return example_batch
if training_args.do_train:
if "train" not in ds:
raise ValueError("--do_train requires a train dataset")
if data_args.max_train_samples is not None:
ds["train"] = ds["train"].shuffle(seed=training_args.seed).select(range(data_args.max_train_samples))
# Set the training transforms
ds["train"].set_transform(train_transforms)
if training_args.do_eval:
if "validation" not in ds:
raise ValueError("--do_eval requires a validation dataset")
if data_args.max_eval_samples is not None:
ds["validation"] = (
ds["validation"].shuffle(seed=training_args.seed).select(range(data_args.max_eval_samples))
)
# Set the validation transforms
ds["validation"].set_transform(val_transforms)
# Initalize our trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=ds["train"] if training_args.do_train else None,
eval_dataset=ds["validation"] if training_args.do_eval else None,
compute_metrics=compute_metrics,
tokenizer=feature_extractor,
data_collator=collate_fn,
)
# Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()
trainer.log_metrics("train", train_result.metrics)
trainer.save_metrics("train", train_result.metrics)
trainer.save_state()
# Evaluation
if training_args.do_eval:
metrics = trainer.evaluate()
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
# Write model card and (optionally) push to hub
kwargs = {
"finetuned_from": model_args.model_name_or_path,
"tasks": "image-classification",
"dataset": data_args.dataset_name,
"tags": ["image-classification"],
}
if training_args.push_to_hub:
trainer.push_to_hub(**kwargs)
else:
trainer.create_model_card(**kwargs)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,64 @@
# Appling OpenDelta to GLUE/SuperGLUE tasks using Seq2Seq Paradigm
## install the repo
```bash
cd ../
python setup_seq2seq.py develop
```
This will add `examples_seq2seq` to the environment path of the python lib.
## Generating the json configuration file
```
python config_gen.py --job $job_name
```
The available job configuration (e.g., `--job lora_t5-base`) can be seen from `config_gen.py`. You can also
create your only configuration.
## Run the code
```
python run_seq2seq.py configs/$job_name/$dataset.json
```
## Possible Errors
1.
```
ValueError: You must login to the Hugging Face hub on this computer by typing `transformers-cli login` and entering your credentials to use `use_auth_token=Tr
ue`. Alternatively, you can pass your own token as the `use_auth_token` argument.
```
- Solution 1: Please register an account on [HuggingFace](https://huggingface.co/)
Then run transformers-cli login on your command line to enter the username and password.
- Solution 2: Disable push_to_hub by modifying in the config.json : "push_to_hub": False
2.
```
OSError: Looks like you do not have git-lfs installed, please install. You can install from https://git-lfs.github.com/. Then run `git lfs install` (you only have to do this once).
```
- Solution 1:
```
wget -P ~ https://github.com/git-lfs/git-lfs/releases/download/v3.0.2/git-lfs-linux-amd64-v3.0.2.tar.gz
cd ~
tar -xvzf git-lfs-linux-amd64-v3.0.2.tar.gz
export PATH=~:$PATH
git-lfs install
```
- Solution 2: Disable push_to_hub by modifying in the config.json : "push_to_hub": False
3. dataset connection error
Solution 1: open a python console, running the error command again, may not be useful
Solution 2: download the dataset by yourself on a internect connected machine, saved to disk and transfer to your server, at last load_from_disk.
## Link to the original training scripts
This example repo is based on the [compacter training scripts](https://github.com/rabeehk/compacter), with compacter-related lines removed. Thanks to the authors of the original repo. In addition, in private correspondence with the authors, they shared the codes to create the json configs. Thanks again for their efforts.

View File

View File

@ -0,0 +1,21 @@
# the final results will be populated here.{
"evaluate": {
"epoch": 20.0,
"eval_accuracy": 89.2156862745098,
"eval_average_metrics": 90.76168929110105,
"eval_f1": 92.3076923076923,
"eval_loss": 0.16493959724903107,
"eval_runtime": 1.6391,
"eval_samples_per_second": 124.455
},
"repo_name": "DeltaHub/bitfit_t5-base_mrpc",
"test": {
"epoch": 20.0,
"test_accuracy": 88.23529411764706,
"test_average_metrics": 89.97971602434077,
"test_f1": 91.72413793103448,
"test_loss": 0.14968213438987732,
"test_runtime": 1.6344,
"test_samples_per_second": 124.82
}
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "cola",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/cola",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "cola",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "cola",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "mnli",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/mnli",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "mnli",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "mnli",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "mrpc",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/mrpc",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "mrpc",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "mrpc",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "qnli",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/qnli",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "qnli",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "qnli",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "qqp",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/qqp",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "qqp",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "qqp",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "rte",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/rte",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "rte",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "rte",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "sst2",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/sst2",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "sst2",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "sst2",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "stsb",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 128,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/stsb",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "stsb",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "stsb",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-boolq",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/superglue-boolq",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-boolq",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-boolq",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-cb",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/superglue-cb",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-cb",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-cb",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-copa",
"eval_steps": 50,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 40,
"output_dir": "outputs/bitfit/t5-base/superglue-copa",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 50,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-copa",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-copa",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-multirc",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/superglue-multirc",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-multirc",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-multirc",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-record",
"eval_steps": 200,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 512,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 3,
"output_dir": "outputs/bitfit/t5-base/superglue-record",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 16,
"per_device_train_batch_size": 16,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 200,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-record",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-record",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-wic",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/superglue-wic",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-wic",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-wic",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,40 @@
{
"dataset_config_name": [
"en"
],
"delta_type": "bitfit",
"do_eval": true,
"do_test": true,
"do_train": true,
"eval_dataset_config_name": [
"en"
],
"eval_dataset_name": "superglue-wsc.fixed",
"eval_steps": 100,
"evaluation_strategy": "steps",
"greater_is_better": true,
"learning_rate": 0.0003,
"load_best_model_at_end": true,
"max_source_length": 256,
"metric_for_best_model": "average_metrics",
"model_name_or_path": "t5-base",
"num_train_epochs": 20,
"output_dir": "outputs/bitfit/t5-base/superglue-wsc.fixed",
"overwrite_output_dir": true,
"per_device_eval_batch_size": 32,
"per_device_train_batch_size": 32,
"predict_with_generate": true,
"push_to_hub": true,
"save_steps": 100,
"save_strategy": "steps",
"save_total_limit": 1,
"seed": 42,
"split_validation_test": true,
"task_name": "superglue-wsc.fixed",
"test_dataset_config_name": [
"en"
],
"test_dataset_name": "superglue-wsc.fixed",
"tokenizer_name": "t5-base",
"warmup_steps": 0
}

View File

@ -0,0 +1,230 @@
import collections
import copy
AllConfigs = {}
BaseConfigs = {}
BaseConfigs['t5-base'] = {
("job_name", "task_name", "eval_dataset_name", "test_dataset_name", "num_train_epochs",
"max_source_length",
"per_device_train_batch_size", "per_device_eval_batch_size", "warmup_steps","save_steps", "eval_steps"): zip(
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record",
"superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
[ 20, 20, 40, 20, 3, 3, 20, 20, 20, 3, 3, 20, 3, 3, 20],
[256, 256, 256, 256, 256, 512, 256, 128, 128, 128, 128, 128, 128, 128, 128],
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[0] *7 +[0] *8,
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
),
"do_train": True,
"do_eval": True,
"do_test": True,
"model_name_or_path": "t5-base",
"tokenizer_name": "t5-base",
"save_total_limit": 1,
# For glue datasets.
"split_validation_test": True,
"seed": 42,
"dataset_config_name": ["en"],
"eval_dataset_config_name": ["en"],
"test_dataset_config_name": ["en"],
# other configurations.
"predict_with_generate": True,
# To evaluate during training.
"load_best_model_at_end": True,
"metric_for_best_model": "average_metrics",
"greater_is_better": True,
"evaluation_strategy": "steps",
"overwrite_output_dir": True,
"push_to_hub": True,
"save_strategy": "steps"
}
AllConfigs['bitfit_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['bitfit_t5-base'].update({
"delta_type": "bitfit",
"learning_rate": 3e-4,
"output_dir": "outputs/bitfit/t5-base/",
})
AllConfigs['adapter_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['adapter_t5-base'].update({
"delta_type": "adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"bottleneck_dim":24,
"output_dir": "outputs/adapter/t5-base/",
})
AllConfigs['lora_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['lora_t5-base'].update({
"delta_type": "lora",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"lora_r": 8,
"output_dir": "outputs/lora/t5-base/",
})
AllConfigs['compacter_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['compacter_t5-base'].update({
"delta_type": "compacter",
"learning_rate": 3e-3,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"output_dir": "outputs/compacter/t5-base/",
"non_linearity": "gelu_new",
#Compacter.
"hypercomplex_division": 4,
"hypercomplex_adapters": True,
"hypercomplex_nonlinearity": "glorot-uniform",
# gradient clip and clamp
"gradient_clip": False,
"phm_clamp": False,
"normalize_phm_weight": False,
"learn_phm": True,
# shared one side
"factorized_phm": True,
"shared_phm_rule": False,
"factorized_phm_rule": False,
"phm_c_init": "normal",
"phm_init_range": 0.0001,
"use_bias_down_sampler": True,
"use_bias_up_sampler": True,
})
AllConfigs['compacter++_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['compacter++_t5-base'].update({
"delta_type": "compacter",
"learning_rate": 3e-3,
"do_train": True,
"do_eval": True,
"do_test": True,
"modified_modules": [
"DenseReluDense"
],
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"output_dir": "outputs/compacter++/t5-base/",
"non_linearity": "gelu_new",
#Compacter.
"hypercomplex_division": 4,
"hypercomplex_adapters": True,
"hypercomplex_nonlinearity": "glorot-uniform",
# gradient clip and clamp
"gradient_clip": False,
"phm_clamp": False,
"normalize_phm_weight": False,
"learn_phm": True,
# shared one side
"factorized_phm": True,
"shared_phm_rule": False,
"factorized_phm_rule": False,
"phm_c_init": "normal",
"phm_init_range": 0.0001,
"use_bias_down_sampler": True,
"use_bias_up_sampler": True,
})
AllConfigs['low_rank_adapter_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['low_rank_adapter_t5-base'].update({
"delta_type": "low_rank_adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm"
],
"output_dir": "outputs/low_rank_adapter/t5-base/",
"non_linearity": "gelu_new",
"low_rank_w_init": "glorot-uniform",
"low_rank_rank": 1,
})
AllConfigs['soft_prompt_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['soft_prompt_t5-base'].update({
"delta_type": "soft_prompt",
"learning_rate": 3e-2,
"soft_token_num":100,
"token_init": False,
"unfrozen_modules": [
"deltas",
],
"output_dir": "outputs/soft_prompt/t5-base/",
})
AllConfigs['prefix_t5-base'] = copy.deepcopy(BaseConfigs['t5-base'])
AllConfigs['prefix_t5-base'].update({
"delta_type": "prefix",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
],
"output_dir": "outputs/prefix/t5-base/",
})
if __name__ == "__main__":
import argparse
import json
import os
parser = argparse.ArgumentParser("Parser to generate configuration")
parser.add_argument("--job", type=str)
args = parser.parse_args()
config = AllConfigs[args.job]
Cartesian_product = []
for key in config:
if isinstance(key, tuple):
Cartesian_product.append(key)
all_config_jsons = {}
for key_tuple in Cartesian_product:
for zipped in config[key_tuple]:
job_name = zipped[0]
all_config_jsons[job_name] = {}
for key_name, zipped_elem in zip(key_tuple, zipped):
if key_name != 'job_name':
all_config_jsons[job_name][key_name] = zipped_elem
for key in config:
if not isinstance(key, tuple):
for job_name in all_config_jsons:
if key == "output_dir":
all_config_jsons[job_name][key] = config[key] + job_name
else:
all_config_jsons[job_name][key] = config[key]
if not os.path.exists(f"./{args.job}/"):
os.mkdir(f"./{args.job}/")
for job_name in all_config_jsons:
with open(f"./{args.job}/{job_name}.json", 'w') as fout:
json.dump(all_config_jsons[job_name], fout, indent=4,sort_keys=True)

View File

@ -0,0 +1,3 @@
from .tasks import TASK_MAPPING, AutoTask
from .data_collator import TaskDataCollatorForSeq2Seq
from .postprocessors import AutoPostProcessor

View File

@ -0,0 +1,16 @@
import numpy as np
from dataclasses import dataclass
from transformers import DataCollatorForSeq2Seq
@dataclass
class TaskDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
def check_uniqueness(self, samples):
assert len(np.unique(samples)) == 1
def __call__(self, features):
# tasks = [d.pop('task') for d in features]
# self.check_uniqueness(tasks)
output = super().__call__(features)
# output["task"] = tasks[0]
return output

View File

@ -0,0 +1,64 @@
import abc
from collections import OrderedDict
import numpy as np
"""Defines functions to process the outputs to make them ready for the evaluation."""
def string_to_float(string, default=-1., **unused_kwargs):
"""Converts string to float, using default when conversion not possible."""
try:
return float(string)
except ValueError:
return default
class PostProcessor(abc.ABC):
"""Postprocess the predictions and labels to make them suitable for
evaluation."""
def __init__(self, tokenizer, ignore_pad_token_for_loss):
self.tokenizer = tokenizer
self.ignore_pad_token_for_loss = ignore_pad_token_for_loss
def process(self, preds, labels, data_info=None):
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = self.tokenizer.batch_decode(preds, skip_special_tokens=True)
if self.ignore_pad_token_for_loss:
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, self.tokenizer.pad_token_id)
decoded_labels = self.tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds = [pred.strip() for pred in decoded_preds]
decoded_labels = [label.strip() for label in decoded_labels]
return decoded_preds, decoded_labels
class MultiRC(PostProcessor):
def process(self, preds, labels, data_info):
preds, labels = super().process(preds, labels, data_info)
preds = [{"group": info["group"], "value":pred} \
for info, pred in zip(data_info, preds)]
labels = [{"group": info["group"], "value": label}\
for info, label in zip(data_info, labels)]
return preds, labels
class Record(PostProcessor):
def process(self, preds, labels, data_info):
preds, labels = super().process(preds, labels, data_info)
labels = [info["answers"] for info in data_info]
return preds, labels
POSTPROCESSOR_MAPPING = OrderedDict(
[
('superglue-record', Record),
('superglue-multirc', MultiRC)
]
)
class AutoPostProcessor:
@classmethod
def get(self, task, tokenizer, ignore_pad_token_for_loss):
if task in POSTPROCESSOR_MAPPING:
return POSTPROCESSOR_MAPPING[task](tokenizer, ignore_pad_token_for_loss)
return PostProcessor(tokenizer, ignore_pad_token_for_loss)

View File

@ -0,0 +1,584 @@
from collections import OrderedDict
import collections
import abc
import functools
from typing import Callable, List, Mapping
from examples_seq2seq.trainers.trainer_utils import pad_punctuation
from examples_seq2seq.metrics import metrics
from .utils import round_stsb_target
import datasets
import logging
import numpy as np
import torch
import re
logger = logging.getLogger(__name__)
class AbstractTask(abc.ABC):
name = NotImplemented
config = NotImplemented
prefix = NotImplemented
preprocessor: Callable = NotImplemented
metric = NotImplemented
metric_names = NotImplemented
split_map = None
labels_list = None
split_to_data_split: Mapping[str, str] = \
{"train": "train", "validation": "validation", "test": "test"}
small_datasets_without_all_splits = ["cola", "wnli", "rte", "superglue-cb", "superglue-copa", "superglue-multirc",
"superglue-wic", "superglue-wsc.fixed", "superglue-rte", "mrpc", "stsb",
"superglue-boolq"]
large_data_without_all_splits = ["qqp", "qnli", "superglue-record", "sst2"]
def __init__(self, config, seed=42):
self.config = config
self.seed = seed
def get_max_target_length(self, tokenizer, default_max_length):
if self.labels_list is not None:
return max([len(tokenizer.encode(label)) for label in self.labels_list])
return default_max_length
def seq2seq_format(self, sources: List[str],
targets: List[str],
add_prefix: bool=False,
prefix: str=None,
extra_fields={}):
src_prefix = self.name if prefix is None else prefix
sources = [src_prefix]+sources if add_prefix else sources
return {'source': ' '.join(sources),
'target': ' '.join(targets),
'task': self.name,
'extra_fields': extra_fields}
def check_n_obs(self, n_obs, total_size):
if n_obs is not None and n_obs > total_size:
n_obs = total_size
logger.warning("n_obs is set to %s", n_obs)
return n_obs
def shuffled_indices(self, dataset):
num_samples = len(dataset)
generator = torch.Generator()
generator.manual_seed(self.seed)
return torch.randperm(num_samples, generator=generator).tolist()
def subsample(self, dataset, n_obs=None, indices=None):
"""
Given a dataset returns the subsampled dataset.
:param n_obs: the number of samples of the subsampled dataset.
:param indices: indices to select the samples from, if not given, indices are computed
from by shuffling the given dataset.
:return: subsampled dataset.
"""
num_samples = len(dataset)
n_obs = self.check_n_obs(n_obs, num_samples)
if indices is None:
indices = self.shuffled_indices(dataset)
indices = indices[:n_obs]
return dataset.select(indices)
def load_dataset(self, split: int):
return datasets.load_dataset(self.name, self.config, split=split, script_version="master")
def get_split_indices(self, split, dataset, validation_size):
indices = self.shuffled_indices(dataset)
if split == "validation":
return indices[:validation_size]
else:
return indices[validation_size:]
def map_dataset(self, dataset, add_prefix):
return dataset.map(functools.partial(self.preprocessor, add_prefix=add_prefix),
remove_columns=dataset.column_names)
def get(self, split, add_prefix=True, n_obs=None, split_validation_test=False):
# For small datasets (n_samples < 10K) without test set, we divide validation set to
# half, use one half as test set and one half as validation set.
if split_validation_test and self.name in self.small_datasets_without_all_splits \
and split != "train":
mapped_split = self.split_to_data_split["validation"]
dataset = self.load_dataset(split=mapped_split)
indices = self.get_split_indices(split, dataset, validation_size=len(dataset)//2)
dataset = self.subsample(dataset, n_obs, indices)
# For larger datasets (n_samples > 10K), we divide training set into 1K as
# validation and the rest as training set, keeping the original validation
# set as the test set.
elif split_validation_test and self.name in self.large_data_without_all_splits \
and split != "test":
dataset = self.load_dataset(split="train")
indices = self.get_split_indices(split, dataset, validation_size=1000)
dataset = self.subsample(dataset, n_obs, indices)
else:
mapped_split = self.split_to_data_split[split]
dataset = self.load_dataset(split=mapped_split)
# shuffles the data and samples it.
if n_obs is not None:
dataset = self.subsample(dataset, n_obs)
return self.map_dataset(dataset, add_prefix)
class Squad(AbstractTask):
name = "squad"
metric = [metrics.squad]
def load_dataset(self, split):
return datasets.load_dataset(self.name, split=split, script_version="master")
def preprocessor(self, example, add_prefix):
answer = pad_punctuation(example['answers']['text'][0])
question = pad_punctuation(example['question'])
context = pad_punctuation(example['context'])
source = ["question:", question,
"context:", context]
target = [answer]
return self.seq2seq_format(source, target, add_prefix)
class MRPC(AbstractTask):
name = "mrpc"
labels_list = ["0", "1"]
metric = [metrics.f1_score_with_invalid, metrics.accuracy]
metric_names = ["f1", "accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'mrpc', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence1:", example['sentence1'],
"sentence2:", example["sentence2"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class COLA(AbstractTask):
name = "cola"
labels_list = ["0", "1"]
metric = [metrics.matthews_corrcoef]
metric_names = ["matthews_correlation"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'cola',
split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence:", example['sentence']]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SST2(AbstractTask):
name = "sst2"
labels_list = ["0", "1"]
metric = [metrics.accuracy]
metric_names = ["accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'sst2',
split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence:", example['sentence']]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class STSB(AbstractTask):
name = "stsb"
labels_list = [str(np.round(label, decimals=1)) for label in np.arange(0, 5.2, 0.2)]
metric = [metrics.pearson_corrcoef, metrics.spearman_corrcoef]
metric_names = ["pearson", "spearmanr"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'stsb',
split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence1:", example['sentence1'],
"sentence2:", example["sentence2"]]
tgt_texts = [str(round_stsb_target(example['label']))]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class QQP(AbstractTask):
name = "qqp"
labels_list = ["0", "1"]
metric = [metrics.f1_score_with_invalid, metrics.accuracy]
metric_names = ["f1", "accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'qqp',
split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["question1:", example['question1'],
"question2:", example["question2"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class MNLI(AbstractTask):
name = "mnli"
labels_list = ["0", "1", "2"]
split_to_data_split = {"train": "train",
"validation": "validation_mismatched",
"test": "validation_matched"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('glue', 'mnli', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["premise:", example['premise'],
"hypothesis", example["hypothesis"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class QNLI(AbstractTask):
name = "qnli"
labels_list = ["0", "1"]
metric = [metrics.accuracy]
metric_names = ["accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'qnli', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["question:", example['question'],
"sentence:", example["sentence"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class RTE(AbstractTask):
name = "rte"
labels_list = ["0", "1"]
metric = [metrics.accuracy]
metric_names = ["accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'rte',
split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence1:", example['sentence1'],
"sentence2:", example["sentence2"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class WNLI(AbstractTask):
name = "wnli"
labels_list = ["0", "1"]
metric = [metrics.accuracy]
metric_names = ["accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('glue', 'wnli', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence1:", example['sentence1'],
"sentence2:", example["sentence2"]]
tgt_texts = [str(example['label'])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUEBoolQ(AbstractTask):
name="superglue-boolq"
labels_list = ['0', '1']
metric = [metrics.accuracy]
metric_names = ["accuracy"]
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'boolq', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["question:", example["question"], "passage:", example["passage"]]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUERTE(AbstractTask):
name="superglue-rte"
labels_list = ['0', '1']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'rte', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["premise:", example["premise"],
"hypothesis:", example["hypothesis"]]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUECB(AbstractTask):
name = "superglue-cb"
labels_list = ['0', '1', '2']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.mean_multiclass_f1(num_classes=3), metrics.accuracy]
metric_names = ["f1_multiclass", "accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'cb', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["premise:", example["premise"], "hypothesis:", example["hypothesis"]]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUECOPA(AbstractTask):
name = "superglue-copa"
labels_list = ['0', '1']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'copa', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["premise:", example["premise"],
"choice1:", example["choice1"],
"choice2:", example["choice2"]]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUEMultiRC(AbstractTask):
name = "superglue-multirc"
labels_list = ['0', '1']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.multirc_f1_over_all_answers,
metrics.mean_group_metric(metrics.exact_match)]
metric_names = ["f1", "em"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'multirc', split=split, script_version="master")
def remove_markup(self, text):
"""Removes the HTML markup."""
text = re.sub('<br>', ' ', text)
text = re.sub('<(/)?b>', '', text)
return text
def preprocessor(self, example, add_prefix=True):
group = example['idx']['question']
# T5 applies remove_markup to the joined string, but this should not make
# any difference as well.
# https://github.com/google-research/text-to-text-transfer-transformer/blob/a1352e625db7ec114062f99d99b0565b9e45c155/t5/data/preprocessors.py#L797
src_texts = ["question:", self.remove_markup(example["question"]),
"answer:", self.remove_markup(example["answer"]),
"paragraph:", self.remove_markup(example["paragraph"])]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix, extra_fields={"group": group})
class SuperGLUEWIC(AbstractTask):
name = "superglue-wic"
labels_list = ['0', '1']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'wic', split=split, script_version="master")
def preprocessor(self, example, add_prefix=True):
src_texts = ["sentence1:", example["sentence1"],
"sentence2:", example["sentence2"],
"word:", example["word"]]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUEWSCFixed(AbstractTask):
# source: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/data/preprocessors.py
"""Convert WSC examples to text2text format.
WSC includes a sentence along with 2 'spans': the first denoting a noun and
the other a pronoun. The 'label' specifies whether or not the pronoun is
referencing the noun. This preprocessor puts ' * ' around the noun and ' # '
around the pronoun.
For example, a typical example from WSC might look like
{
'text': 'This is a test sentence .',
'span1_text': 'test',
'span1_index': 3,
'span2_text': 'This',
'span2_index': 0,
'label': 0
}
This example would be transformed to
{
'inputs': 'wsc text: # This # is a * test * sentence .',
'targets': 'False'
}
"""
name = "superglue-wsc.fixed"
labels_list = ['0', '1']
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.accuracy]
metric_names = ["accuracy"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'wsc.fixed', split=split, script_version="master")
def _mark_span(self, text, span_str, span_idx, mark):
pattern_tmpl = r'^((?:\S+\s){N})(W)'
pattern = re.sub('N', str(span_idx), pattern_tmpl)
pattern = re.sub('W', span_str, pattern)
return re.sub(pattern, r'\1{0} \2 {0}'.format(mark), text)
def preprocessor(self, example, add_prefix=True):
# converts text as done in T5.
text = example['text']
text = self._mark_span(text, example['span1_text'], example['span1_index'], '*')
# Compensate for 2 added "words" added in previous step.
span2_index = example['span2_index'] + 2 * int(example['span1_index'] < example['span2_index'])
text = self._mark_span(text, example['span2_text'], span2_index, '#')
src_texts = ["text:", text]
tgt_texts = [str(example["label"])]
return self.seq2seq_format(src_texts, tgt_texts, add_prefix)
class SuperGLUERecord(AbstractTask):
"""Convert ReCoRD examples to text2text examples.
ReCoRD contains a passage, query containing a '@placeholder' string, and a set
of entities that are the possible values of the placeholder. Each train and
validation example will have a list of answers, any of which would be
considered correct.
For example, a typical example from ReCoRD might look like
{
'passsage': 'This is the passage.',
'query': 'A @placeholder is a bird.',
'entities': ['penguin', 'potato', 'pigeon'],
'answers': ['penguin', 'pigeon'],
}
which this preprocessor would turn into the following two examples:
{
'inputs': 'record query: A @placeholder is a bird. entities: penguin, '
'potato, pigeon passage: This is the passage.',
'targets': 'penguin',
}
and
{
'inputs': 'record query: A @placeholder is a bird. entities: penguin, '
'potato, pigeon passage: This is the passage.',
'targets': 'pigeon',
}
"""
name = "superglue-record"
split_to_data_split = {"train": "train",
"validation": "validation",
"test": "validation"}
metric = [metrics.squad]
metric_names = ["squad"]
def load_dataset(self, split):
return datasets.load_dataset('super_glue', 'record', split=split, script_version="master")
def preprocessor(self, batch, add_prefix=True):
new_batch = collections.defaultdict(list)
keys = batch.keys()
for values in zip(*batch.values()):
ex = {k: v for k, v in zip(keys, values)}
# updates the passage.
passage = ex['passage']
passage = re.sub(r'(\.|\?|\!|\"|\')\n@highlight\n', r'\1 ', passage)
passage = re.sub(r'\n@highlight\n', '. ', passage)
inputs = f"record query: {ex['query']} entities: {', '.join(ex['entities'])} passage: {passage}"
if add_prefix:
inputs = self.name + " " + inputs
# duplicates the samples based on number of answers.
num_answers = len(ex["answers"])
num_duplicates = np.maximum(1, num_answers)
new_batch["source"].extend([inputs] * num_duplicates)
new_batch["target"].extend(ex["answers"] if num_answers > 0 else ["<unk>"])
new_batch["task"].extend([self.name] * num_duplicates)
new_batch["extra_fields"].extend([{"answers": ex["answers"]}]*num_duplicates)
return new_batch
def map_dataset(self, dataset, add_prefix=True):
return dataset.map(functools.partial(self.preprocessor, add_prefix=add_prefix),
batched=True, remove_columns=dataset.column_names)
TASK_MAPPING = OrderedDict(
[
('squad', Squad),
('mrpc', MRPC),
('cola', COLA),
('sst2', SST2),
('qnli', QNLI),
('rte', RTE),
('wnli', WNLI),
('mnli', MNLI),
('qqp', QQP),
('stsb', STSB),
('superglue-boolq', SuperGLUEBoolQ),
('superglue-rte', SuperGLUERTE),
('superglue-cb', SuperGLUECB),
('superglue-copa', SuperGLUECOPA),
('superglue-multirc', SuperGLUEMultiRC),
('superglue-wic', SuperGLUEWIC),
('superglue-wsc.fixed', SuperGLUEWSCFixed),
('superglue-record', SuperGLUERecord)
]
)
class AutoTask:
@classmethod
def get(self, task, config, seed=42):
if task in TASK_MAPPING:
return TASK_MAPPING[task](config, seed)
raise ValueError(
"Unrecognized task {} for AutoTask Model: {}.\n"
"Task name should be one of {}.".format(
", ".join(c for c in TASK_MAPPING.keys())
)
)

View File

@ -0,0 +1,17 @@
import numpy as np
def round_stsb_target(label):
"""STSB maps two sentences to a floating point number between 1 and 5
representing their semantic similarity. Since we are treating all tasks as
text-to-text tasks we need to convert this floating point number to a string.
The vast majority of the similarity score labels in STSB are in the set
[0, 0.2, 0.4, ..., 4.8, 5.0]. So, we first round the number to the closest
entry in this set, and then we convert the result to a string (literally e.g.
"3.4"). This converts STSB roughly into a 26-class classification dataset.
Args:
label: original label.
Returns:
A preprocessed label.
"""
return np.round((label * 5) / 5, decimals=1)

View File

@ -0,0 +1,173 @@
# several of the evaluation metrics are from https://github.com/google-research/text-to-text-transfer-transformer/blob/a1352e625db7ec114062f99d99b0565b9e45c155/t5/evaluation/metrics.py
"""Defines different metrics used for evaluation of tasks."""
import numpy as np
import scipy
import math
import sklearn
import collections
from logging import getLogger
from .qa_utils import normalize_squad, qa_metrics
import sklearn.metrics
logger = getLogger(__name__)
def accuracy(predictions, targets) -> dict:
"""Computes the average accuracy."""
return {"accuracy": 100 * ((np.array(predictions) == np.array(targets)).mean())}
def pearson_corrcoef(predictions, targets) -> dict:
"""Computes Pearson correlation coefficient."""
from examples_seq2seq.data_processors.postprocessors import string_to_float
targets = [string_to_float(target) for target in targets]
predictions= [string_to_float(prediction) for prediction in predictions]
pearson_corrcoef = 100 * scipy.stats.pearsonr(targets, predictions)[0]
# Note that if all the predictions will be the same, spearman
# correlation is nan, to gaurad against this, we check the output
# and return 0 in this case.
if math.isnan(pearson_corrcoef):
pearson_corrcoef = 0
return {"pearson": pearson_corrcoef}
def spearman_corrcoef(predictions, targets) -> dict:
"""Computes Spearman correlation coefficient."""
# TODO: we need to do postprocessors in a clean way for each dataset.
from examples_seq2seq.data_processors.postprocessors import string_to_float
targets = [string_to_float(target) for target in targets]
predictions= [string_to_float(prediction) for prediction in predictions]
spearman_corrcoef = 100 * scipy.stats.spearmanr(targets, predictions)[0]
# Note that if all the predictions will be the same, spearman
# correlation is nan, to gaurad against this, we check the output
# and return 0 in this case.
if math.isnan(spearman_corrcoef):
spearman_corrcoef = 0
return {"spearmanr": spearman_corrcoef}
def f1_score_with_invalid(predictions, targets) -> dict:
"""Computes F1 score, with any prediction != 0 or 1 is counted as incorrect.
Args:
targets: list of targets, either 0 or 1
predictions: list of predictions, any integer value
Returns:
F1 score, where any prediction != 0 or 1 is counted as wrong.
"""
def binary_reverse(labels):
return ['0' if label == '1' else '1' for label in labels]
targets, predictions = np.asarray(targets), np.asarray(predictions)
# Get indices of invalid predictions.
invalid_idx_mask = np.logical_and(predictions != '0', predictions != '1')
# For any prediction != 0 or 1, we set the prediction to the opposite of its corresponding target.
predictions[invalid_idx_mask] = binary_reverse(targets[invalid_idx_mask])
targets = targets.astype(np.int32)
predictions = predictions.astype(np.int32)
return {"f1": 100 * sklearn.metrics.f1_score(targets, predictions)}
# TODO: maybe gaurd against invalid values https://stackoverflow.com/questions/56865344/how-do-i-calculate-the-matthews-correlation-coefficient-in-tensorflow
def matthews_corrcoef(predictions, targets) -> dict:
"""Computes the Matthews correlation coefficient."""
return {"matthews_correlation": 100 * sklearn.metrics.matthews_corrcoef(targets, predictions)}
def squad(predictions, targets):
"""Computes SQuAD metrics, maximizing over answers per question.
Args:
targets: list of lists of strings
predictions: list of strings
Returns:
dict with score_key: squad score across all targets and predictions
"""
targets = [[normalize_squad(t) for t in u] for u in targets]
predictions = [normalize_squad(p) for p in predictions]
return qa_metrics(targets, predictions)
def exact_match(predictions, targets):
"""Computes whether the targets match predictions exactly."""
return {"em": 100 * float(np.array_equal(targets, predictions))}
def sklearn_metrics_wrapper(metric_str,
metric_dict_str=None,
metric_post_process_fn=None,
**metric_fn_kwargs):
"""Wraps any sklearn.metric function and returns a t5 metric function.
Args:
metric_str: string, the function from `sklearn.metrics` to use.
metric_dict_str: optional string, if not specified `metric_str` is used as
the key in the returned dictionary.
metric_post_process_fn: callable, if specified the final computed metric
will be passed through this.
**metric_fn_kwargs: kwargs, passed to the metric function we are calling.
Returns:
the function that calculates the metric in a dict.
"""
if not hasattr(sklearn.metrics, metric_str):
raise ValueError("sklearn.metrics does not have: %s" % metric_str)
def fn(predictions, targets):
metric_fn = getattr(sklearn.metrics, metric_str)
metric_val = metric_fn(targets, predictions, **metric_fn_kwargs)
if metric_post_process_fn is not None:
metric_val = metric_post_process_fn(metric_val)
return {metric_dict_str or metric_str: metric_val}
return fn
def mean_multiclass_f1(num_classes, **metric_fn_kwargs):
"""Computes the unweighted average of the F1 per class."""
return sklearn_metrics_wrapper(
"fbeta_score",
metric_dict_str="f1_multiclass",
metric_post_process_fn=lambda x: 100 * x,
beta=1,
labels=range(num_classes),
average="macro",
**metric_fn_kwargs)
def multirc_f1_over_all_answers(targets, predictions):
"""Special metric for MultiRC which computes F1 score over all examples.
This is necessary because the targets/predictions for MultiRC are dicts and
the f1_score_with_invalid expects a list of True/False labels, not dicts. As
a result we just need to key in the "value" for each of the example dicts
before feeding into f1_score_with_invalid.
Args:
targets: list of dicts, where each dict has a "value" key.
predictions: list of dicts, where each dict has a "value" key.
Returns:
F1 score over values, where any prediction != 0 or 1 is counted as wrong.
"""
return f1_score_with_invalid(
[t["value"] for t in targets], [p["value"] for p in predictions]
)
def mean_group_metric(metric_fn, group_key="group", value_key="value"):
"""Returns a metric that averages `metric_fn` on sub-groups of results.
The sub-groups are defined by aggregating results (targets and predictions)
by accessing the feature specified by `group_key` in the target dicts.
**WARNING**: Using this function can produce unreliable results if you do not
pass in full groups. For example, if you evaluate over a random subsample of a
validation set and do not retain all of the examples in each group, you may
get results which aren't directly comparable to using the full validation set.
Args:
metric_fn: function, the metric to compute on the subgroups.
group_key: string, the key for the grouping value in the target dictionary.
value_key: string, the key for the value in the dictionaries.
"""
def my_metric(targets, predictions):
"""Computes mean of `metric_fn` over subgroups of results."""
grouped_values = collections.defaultdict(lambda: ([], []))
for targ, pred in zip(targets, predictions):
g = targ[group_key]
grouped_values[g][0].append(targ[value_key])
grouped_values[g][1].append(pred[value_key])
group_scores = collections.defaultdict(list)
for (targets, predictions) in grouped_values.values():
for metric, score in metric_fn(targets, predictions).items():
group_scores[metric].append(score)
return {metric: np.mean(scores) for metric, scores in group_scores.items()}
return my_metric

View File

@ -0,0 +1,96 @@
# Copyright 2021 The T5 Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# source: the codes are from https://github.com/google-research/text-to-text-transfer-transformer
"""Utilities for Question Answering (QA) evaluation.
Matches results on the SQuAD (v1.1) and TriviaQA (v1.0) evaluation scripts.
"""
import collections
import string
import regex as re
import numpy as np
def _normalize_answer(text, punc_chars, punc_repl):
"""Lower text and remove punctuation, articles and extra whitespace."""
def remove_articles(s):
return re.sub(r"\b(a|an|the)\b", " ", s)
def replace_punctuation(s):
to_replace = set(punc_chars)
return "".join(punc_repl if ch in to_replace else ch for ch in s)
def white_space_fix(s):
return " ".join(s.split())
text = text.lower()
text = replace_punctuation(text)
text = remove_articles(text)
text = white_space_fix(text)
return text
def normalize_trivia_qa(answer):
"""Normalization used in official TriviaQA evaluation script."""
return _normalize_answer(
answer, punc_chars=string.punctuation + "´`_", punc_repl=" ").strip()
def normalize_squad(answer):
"""Normalization used in official SQuAD evaluation script."""
return _normalize_answer(answer, punc_chars=string.punctuation, punc_repl="")
def _metric_max_over_ground_truths(metric_fn, ground_truths, prediction):
"""Computes the maximum of the metric over all ground truths."""
return max(
metric_fn(ground_truth, prediction) for ground_truth in ground_truths
)
def _exact_match_score(target, prediction):
return target == prediction
def _f1_score(target, prediction):
"""Computes token f1 score for a single target and prediction."""
prediction_tokens = prediction.split()
target_tokens = target.split()
common = (collections.Counter(prediction_tokens) &
collections.Counter(target_tokens))
num_same = sum(common.values())
if num_same == 0:
return 0
precision = 1.0 * num_same / len(prediction_tokens)
recall = 1.0 * num_same / len(target_tokens)
f1 = (2 * precision * recall) / (precision + recall)
return f1
def qa_metrics(targets, predictions):
"""Computes exact match and f1 QA scores, expecting pre-normalized text."""
if len(targets) != len(predictions):
raise ValueError("Number of targets and predictions must match.")
em = np.mean([
_metric_max_over_ground_truths(_exact_match_score, t, p)
for p, t in zip(predictions, targets)
])
f1 = np.mean([
_metric_max_over_ground_truths(_f1_score, t, p)
for p, t in zip(predictions, targets)
])
em *= 100
f1 *= 100
return {"em": em, "f1": f1}

View File

@ -0,0 +1,7 @@
files=(cola mnli mrpc qnli qqp rte sst2 stsb superglue-boolq superglue-cb superglue-copa superglue-multirc superglue-record superglue-wic superglue-wsc.fixed)
for ((i=$1; i<=$2; i++))
do
dataset=${files[i]}
echo "id$i:$dataset"
TOKENIZERS_PARALLELISM=false python run_seq2seq.py configs/$3/$dataset.json
done

View File

@ -0,0 +1,468 @@
# coding=utf-8
# Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Fine-tuning the library models for sequence to sequence.
"""
# You can also adapt this script on your own sequence to sequence task. Pointers for this are left as comments.
import functools
import logging
from opendelta.utils.delta_hub import create_hub_repo_name
import torch
import os
os.environ['MKL_THREADING_LAYER'] = 'GNU'
os.environ['MKL_SERVICE_FORCE_INTEL'] = '1'
import sys
import subprocess
from typing import Optional, List
from datasets import load_dataset, load_metric, concatenate_datasets
import transformers
from transformers import (
AutoConfig,
AutoModelForSeq2SeqLM,
AutoTokenizer,
HfArgumentParser,
MBartTokenizer,
default_data_collator,
set_seed,
)
from transformers.trainer_utils import is_main_process, get_last_checkpoint
# from ..seq2seq.utils import get_adapter_config
from examples_seq2seq.data_processors import AutoTask, TaskDataCollatorForSeq2Seq, AutoPostProcessor
from examples_seq2seq.seq2seq_trainer import Seq2SeqTrainer
# from training_args import AdapterTrainingArguments
from examples_seq2seq.trainers.trainer_utils import save_training_config
from dataclasses import dataclass, field
from transformers.models.t5.modeling_t5 import T5Config, T5ForConditionalGeneration
from examples_seq2seq.trainers.model_args import ModelArguments
from examples_seq2seq.trainers.trainer_args import TrainingArguments, DataTrainingArguments
logger = logging.getLogger(__name__)
def run_command(command):
output = subprocess.getoutput(command)
return output
TASK_TO_METRICS = {"mrpc": ["accuracy", "f1"],
"cola": ['matthews_correlation'],
"stsb": ['pearson', 'spearmanr'],
'sst2': ['accuracy'],
"mnli": ["accuracy"],
"mnli_mismatched": ["accuracy"],
"mnli_matched": ["accuracy"],
"qnli": ["accuracy"],
"rte": ["accuracy"],
"wnli": ["accuracy"],
"qqp": ["accuracy", "f1"],
"superglue-boolq": ["accuracy"],
"superglue-rte": ["accuracy"],
"superglue-cb": ["f1_multiclass", "accuracy"],
"superglue-copa": ["accuracy"],
"superglue-multirc": ["f1", "em"],
"superglue-wic": ["accuracy"],
"superglue-wsc.fixed": ["accuracy"],
"superglue-record": ["f1", "em"]
}
class RemainArgHfArgumentParser(HfArgumentParser):
def parse_json_file(self, json_file: str, return_remaining_args=True ):
"""
Alternative helper method that does not use `argparse` at all, instead loading a json file and populating the
dataclass types.
"""
import argparse
import json
from pathlib import Path
import dataclasses
data = json.loads(Path(json_file).read_text())
outputs = []
for dtype in self.dataclass_types:
keys = {f.name for f in dataclasses.fields(dtype) if f.init}
inputs = {k: data.pop(k) for k in list(data.keys()) if k in keys}
obj = dtype(**inputs)
outputs.append(obj)
remain_args = argparse.ArgumentParser()
remain_args.__dict__.update(data)
if return_remaining_args:
return (*outputs, remain_args)
else:
return (*outputs,)
def main():
# See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns.
parser = RemainArgHfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args, delta_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args, delta_args = parser.parse_args_into_dataclasses()
# Detecting last checkpoint.
last_checkpoint = None
if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
last_checkpoint = get_last_checkpoint(training_args.output_dir)
print("#### last_checkpoint ", last_checkpoint)
if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
'''
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
'''
pass
elif last_checkpoint is not None:
logger.info(
f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Set the verbosity to info of the Transformers logger (on main process only):
if is_main_process(training_args.local_rank):
transformers.utils.logging.set_verbosity_info()
logger.info("Training/evaluation parameters %s", training_args)
# Set seed before initializing model.
set_seed(training_args.seed)
# Get the datasets: you can either provide your own CSV/JSON training and evaluation files (see below)
# or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
# (the dataset will be downloaded automatically from the datasets Hub).
#
# For CSV/JSON files in the summarization task, this script will use the first column for the full texts and the
# second column for the summaries (unless you specify column names for this with the `text_column` and
# `summary_column` arguments).
# For translation, only JSON files are supported, with one field named "translation" containing two keys for the
# source and target languages (unless you adapt what follows).
#
# In distributed training, the load_dataset function guarantee that only one local process can concurrently
# download the dataset.
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html.
# Load pretrained model and tokenizer
#
# Distributed training:
# The .from_pretrained methods guarantee that only one local process can concurrently
# download model & vocab.
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
config.dropout_rate = 0.0
tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=model_args.use_fast_tokenizer,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model = AutoModelForSeq2SeqLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model.resize_token_embeddings(len(tokenizer))
if delta_args.delta_type.lower() != "none":
from opendelta import AutoDeltaConfig,AutoDeltaModel
delta_config = AutoDeltaConfig.from_dict(vars(delta_args))
delta_model = AutoDeltaModel.from_config(delta_config, backbone_model=model)
delta_model.freeze_module(set_state_dict = True)
delta_model.log(delta_ratio=True, trainable_ratio=True, visualization=True)
# model parallelize
if hasattr(training_args, "model_parallel") and training_args.model_parallel:
logger.info('parallelize model!')
model.parallelize()
data_args.dataset_name = [data_args.task_name]
data_args.eval_dataset_name = [data_args.eval_dataset_name]
data_args.test_dataset_name = [data_args.test_dataset_name]
data_args.dataset_config_name = [data_args.dataset_config_name]
data_args.eval_dataset_config_name = [data_args.eval_dataset_config_name]
data_args.test_dataset_config_name = [data_args.test_dataset_config_name]
assert len(data_args.dataset_name) == len(data_args.dataset_config_name)
if data_args.eval_dataset_name is not None:
assert len(data_args.eval_dataset_name) == len(data_args.eval_dataset_config_name)
if data_args.test_dataset_name is not None:
assert len(data_args.test_dataset_name) == len(data_args.test_dataset_config_name)
# Temporarily set max_target_length for training.
#max_target_length = data_args.max_target_length
padding = "max_length" if data_args.pad_to_max_length else False
def preprocess_function(examples, max_target_length):
# max_target_length += 1
# model_inputs = tokenizer([s+"<extra_id_0>" for s in examples['source']], max_length=data_args.max_source_length,
# padding=padding, truncation=True)
# # Setup the tokenizer for targets
# with tokenizer.as_target_tokenizer():
# labels = tokenizer(['<extra_id_0>'+t for t in examples['target']], max_length=max_target_length, padding=padding, truncation=True)
model_inputs = tokenizer([s for s in examples['source']], max_length=data_args.max_source_length,
padding=padding, truncation=True)
# Setup the tokenizer for targets
with tokenizer.as_target_tokenizer():
labels = tokenizer([t for t in examples['target']], max_length=max_target_length, padding=padding, truncation=True)
# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
# padding in the loss.
if padding == "max_length" and data_args.ignore_pad_token_for_loss:
labels["input_ids"] = [
[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
]
model_inputs["labels"] = labels["input_ids"]
model_inputs["extra_fields"] = examples['extra_fields']
return model_inputs
column_names = ['source', 'target', 'extra_fields']
performance_metrics = {}
if training_args.do_train:
train_datasets = [AutoTask.get(dataset_name,
dataset_config_name,
seed=data_args.data_seed).get(
split="train",
split_validation_test=training_args.split_validation_test,
add_prefix=True,
n_obs=data_args.max_train_samples)
for dataset_name, dataset_config_name\
in zip(data_args.dataset_name, data_args.dataset_config_name)]
max_target_lengths = [AutoTask.get(dataset_name, dataset_config_name).get_max_target_length(\
tokenizer=tokenizer, default_max_length=data_args.max_target_length)\
for dataset_name, dataset_config_name in zip(data_args.dataset_name, data_args.dataset_config_name)]
for i, train_dataset in enumerate(train_datasets):
train_datasets[i] = train_datasets[i].map(
functools.partial(preprocess_function, max_target_length=max_target_lengths[i]),
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names, # if train_dataset != "superglue-record" else column_names+["answers"],
load_from_cache_file=not data_args.overwrite_cache,
)
train_dataset = concatenate_datasets(train_datasets)
if training_args.do_eval:
eval_datasets = {eval_dataset: AutoTask.get(eval_dataset, eval_dataset_config,
seed=data_args.data_seed).get(
split="validation",
split_validation_test=training_args.split_validation_test,
add_prefix=True,
n_obs=data_args.max_val_samples)
for eval_dataset, eval_dataset_config in zip(data_args.eval_dataset_name, data_args.eval_dataset_config_name)}
max_target_lengths = [AutoTask.get(dataset_name, dataset_config_name).get_max_target_length( \
tokenizer=tokenizer, default_max_length=data_args.max_target_length) \
for dataset_name, dataset_config_name in zip(data_args.eval_dataset_name, data_args.eval_dataset_config_name)]
for k, name in enumerate(eval_datasets):
eval_datasets[name] = eval_datasets[name].map(
functools.partial(preprocess_function, max_target_length=max_target_lengths[k]),
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names, # if name != "superglue-record" else column_names+["answers"],
load_from_cache_file=not data_args.overwrite_cache,
)
if training_args.do_test:
test_datasets = {test_dataset: AutoTask.get(test_dataset, test_dataset_config,
seed=data_args.data_seed).get(
split="test",
split_validation_test=training_args.split_validation_test,
add_prefix=True,
n_obs=data_args.max_test_samples)
for test_dataset, test_dataset_config in zip(data_args.test_dataset_name, data_args.test_dataset_config_name)}
max_target_lengths = [AutoTask.get(dataset_name, dataset_config_name).get_max_target_length( \
tokenizer=tokenizer, default_max_length=data_args.max_target_length) \
for dataset_name, dataset_config_name in zip(data_args.test_dataset_name, data_args.test_dataset_config_name)]
for k, name in enumerate(test_datasets):
test_datasets[name] = test_datasets[name].map(
functools.partial(preprocess_function, max_target_length=max_target_lengths[k]),
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
)
# Data collator
label_pad_token_id = -100 if data_args.ignore_pad_token_for_loss else tokenizer.pad_token_id
if data_args.pad_to_max_length:
data_collator = default_data_collator
else:
data_collator = TaskDataCollatorForSeq2Seq(
tokenizer,
label_pad_token_id=label_pad_token_id,
pad_to_multiple_of=8 if training_args.fp16 else None,
)
# Metric, we assume we have only one training task.
eval_metrics = [AutoTask.get(dataset_name, dataset_config_name).metric\
for dataset_name, dataset_config_name in zip(data_args.dataset_name, data_args.dataset_config_name)][0]
# Extracts the extra information needed to evaluate on each dataset.
# These information are only used in the compute_metrics.
# We will assume that the test/eval dataloader does not change the order of
# the data.
data_info = {"eval": eval_datasets[data_args.eval_dataset_name[0]]['extra_fields'],
"test": test_datasets[data_args.test_dataset_name[0]]['extra_fields'],
"train": train_dataset['extra_fields']}
def compute_metrics(eval_preds):
preds, labels, data_info = eval_preds
post_processor = AutoPostProcessor.get(data_args.dataset_name[0], tokenizer,
data_args.ignore_pad_token_for_loss)
decoded_preds, decoded_labels = post_processor.process(preds, labels, data_info)
result = {}
for metric in eval_metrics:
result.update(metric(decoded_preds, decoded_labels))
return result
# Initialize our Trainer
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
delta_args=delta_args,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=list(eval_datasets.values())[0] if training_args.do_eval else None,
data_info = data_info,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics if training_args.predict_with_generate else None,
evaluation_metrics = TASK_TO_METRICS[data_args.dataset_name[0]],
)
# Saves training config.
if trainer.is_world_process_zero():
os.makedirs(training_args.output_dir, exist_ok=True)
save_training_config(sys.argv[1], training_args.output_dir)
# Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
if training_args.compute_time:
torch.cuda.synchronize() # wait for move to complete
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
train_result = trainer.train(resume_from_checkpoint=checkpoint)
if training_args.compute_time:
end.record()
torch.cuda.synchronize() # wait for all_reduce to complete
total_time = start.elapsed_time(end)/(1000*60)
performance_metrics.update({"total_time in minutes ": total_time})
trainer.save_model() # Saves the tokenizer too for easy upload
train_metrics = train_result.metrics
max_train_samples = (
data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset)
)
train_metrics["train_samples"] = min(max_train_samples, len(train_dataset))
trainer.log_metrics("train", train_metrics)
trainer.save_metrics("train", train_metrics)
trainer.save_state()
if torch.cuda.is_available() and training_args.compute_memory:
peak_memory = (torch.cuda.max_memory_allocated() / 1024 ** 2)/1000
print(
"Memory utilization",
peak_memory,
"GB"
)
performance_metrics.update({"peak_memory": peak_memory})
if training_args.compute_memory or training_args.compute_time:
print(performance_metrics)
trainer.save_metrics("performance", performance_metrics)
# Evaluation
results = {}
if training_args.do_eval:
logger.info("*** Evaluate ***")
for task, eval_dataset in eval_datasets.items():
metrics = trainer.evaluate(eval_dataset=eval_dataset,
max_length=data_args.val_max_target_length, num_beams=data_args.num_beams,
)
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
results['evaluate'] = metrics
# Test
if training_args.do_test:
logger.info("*** Test ***")
for task, test_dataset in test_datasets.items():
metrics = trainer.evaluate(eval_dataset=test_dataset,
max_length=data_args.test_max_target_length, num_beams=data_args.num_beams,
metric_key_prefix="test"
)
trainer.log_metrics("test", metrics)
trainer.save_metrics("test", metrics)
results['test'] = metrics
repo_name = create_hub_repo_name(root="DeltaHub",
dataset=data_args.task_name,
delta_type = delta_args.delta_type,
model_name_or_path= model_args.model_name_or_path)
results['repo_name'] = repo_name
if training_args.push_to_hub: # TODO add description here
delta_model.save_finetuned(push_to_hub=True, save_directory=repo_name, use_auth_token=True)
# trainer.push_to_hub(**kwargs)
else:
delta_model.save_finetuned(push_to_hub=False, save_directory=repo_name, use_auth_token=True)
return results
if __name__ == "__main__":
result = main()
import json
with open("collect_result.jsonl", 'a') as fout:
string = json.dumps(result, indent=4,sort_keys=True)
fout.write(string+"\n")
print(result)

View File

@ -0,0 +1,127 @@
from packaging import version
import torch
from torch import nn
from typing import Any, Dict, List, Optional, Tuple, Union
from torch.utils.data.dataset import Dataset
from transformers import Seq2SeqTrainer as HfSeq2SeqTrainner
from examples_seq2seq.trainers.trainer import BaseTrainer
# if is_sagemaker_mp_enabled():
# import smdistributed.modelparallel.torch as smp
# from transformers.trainer_utils import ShardedDDPOption
# if is_fairscale_available():
# dep_version_check("fairscale")
# import fairscale
# from fairscale.nn.data_parallel import FullyShardedDataParallel as FullyShardedDDP
# from fairscale.nn.data_parallel import ShardedDataParallel as ShardedDDP
# from fairscale.nn.wrap import auto_wrap
# from fairscale.optim import OSS
# from fairscale.optim.grad_scaler import ShardedGradScaler
from transformers.optimization import Adafactor, AdamW, get_scheduler
from transformers.trainer_pt_utils import get_parameter_names, is_sagemaker_mp_enabled
from transformers.integrations import is_fairscale_available
if version.parse(torch.__version__) >= version.parse("1.6"):
from torch.cuda.amp import autocast
class Seq2SeqTrainer(HfSeq2SeqTrainner, BaseTrainer):
def __init__(self, train_dataset_sizes=None, delta_args=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.train_dataset_sizes = train_dataset_sizes
self.delta_args = delta_args
def evaluate(
self,
eval_dataset: Optional[Dict[str, Dataset]] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = "eval",
max_length: Optional[int] = None,
num_beams: Optional[int] = None,
) -> Dict[str, float]:
# TODO: this also needs to be set per dataset
self._max_length = max_length
self._num_beams = num_beams
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
def prediction_step(
self,
model: nn.Module,
inputs: Dict[str, Union[torch.Tensor, Any]],
prediction_loss_only: bool,
ignore_keys: Optional[List[str]] = None,
) -> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
"""
Perform an evaluation step on :obj:`model` using obj:`inputs`.
Subclass and override to inject custom behavior.
Args:
model (:obj:`nn.Module`):
The model to evaluate.
inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
The inputs and targets of the model.
The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
argument :obj:`labels`. Check your model's documentation for all accepted arguments.
prediction_loss_only (:obj:`bool`):
Whether or not to return the loss only.
Return:
Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
labels (each being optional).
"""
if not self.args.predict_with_generate or prediction_loss_only:
return super().prediction_step(
model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
)
has_labels = "labels" in inputs
inputs = self._prepare_inputs(inputs)
gen_kwargs = {
"max_length": self._max_length if self._max_length is not None else self.model.config.max_length,
"num_beams": self._num_beams if self._num_beams is not None else self.model.config.num_beams,
}
generated_tokens = self.model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
**gen_kwargs,
)
# in case the batch is shorter than max length, the output should be padded
if generated_tokens.shape[-1] < gen_kwargs["max_length"]:
generated_tokens = self._pad_tensors_to_max_len(generated_tokens, gen_kwargs["max_length"])
with torch.no_grad():
if self.use_amp:
with autocast():
outputs = model(**inputs)
else:
outputs = model(**inputs)
if has_labels:
if self.label_smoother is not None:
loss = self.label_smoother(outputs, inputs["labels"]).mean().detach()
else:
loss = (outputs["loss"] if isinstance(outputs, dict) else outputs[0]).mean().detach()
else:
loss = None
if self.args.prediction_loss_only:
return (loss, None, None)
labels = inputs["labels"]
if labels.shape[-1] < gen_kwargs["max_length"]:
labels = self._pad_tensors_to_max_len(labels, gen_kwargs["max_length"])
return (loss, generated_tokens, labels)

View File

@ -0,0 +1,2 @@
from .trainer import BaseTrainer
from .seq2seq_trainer import Seq2SeqTrainer

View File

@ -0,0 +1,36 @@
from dataclasses import dataclass, field
from typing import Optional, List
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
"""
model_name_or_path: str = field(
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
tokenizer_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None,
metadata={"help": "Where to store the pretrained models downloaded from huggingface.co"},
)
use_fast_tokenizer: bool = field(
default=True,
metadata={"help": "Whether to use one of the fast tokenizer (backed by the tokenizers library) or not."},
)
model_revision: str = field(
default="main",
metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
)
use_auth_token: bool = field(
default=False,
metadata={
"help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
"with private models)."
},
)

View File

@ -0,0 +1,108 @@
from packaging import version
import torch
from torch import nn
from typing import Any, Dict, List, Optional, Tuple, Union
from torch.utils.data.dataset import Dataset
from transformers import Seq2SeqTrainer
from .trainer import BaseTrainer
if version.parse(torch.__version__) >= version.parse("1.6"):
from torch.cuda.amp import autocast
class Seq2SeqTrainer(Seq2SeqTrainer, BaseTrainer):
def __init__(self, train_dataset_sizes=None, delta_args=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.train_dataset_sizes = train_dataset_sizes
self.delta_args = delta_args
def evaluate(
self,
eval_dataset: Optional[Dict[str, Dataset]] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = "eval",
max_length: Optional[int] = None,
num_beams: Optional[int] = None,
) -> Dict[str, float]:
# TODO: this also needs to be set per dataset
self._max_length = max_length
self._num_beams = num_beams
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
def prediction_step(
self,
model: nn.Module,
inputs: Dict[str, Union[torch.Tensor, Any]],
prediction_loss_only: bool,
ignore_keys: Optional[List[str]] = None,
) -> Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]:
"""
Perform an evaluation step on :obj:`model` using obj:`inputs`.
Subclass and override to inject custom behavior.
Args:
model (:obj:`nn.Module`):
The model to evaluate.
inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
The inputs and targets of the model.
The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
argument :obj:`labels`. Check your model's documentation for all accepted arguments.
prediction_loss_only (:obj:`bool`):
Whether or not to return the loss only.
Return:
Tuple[Optional[float], Optional[torch.Tensor], Optional[torch.Tensor]]: A tuple with the loss, logits and
labels (each being optional).
"""
if not self.args.predict_with_generate or prediction_loss_only:
return super().prediction_step(
model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
)
has_labels = "labels" in inputs
inputs = self._prepare_inputs(inputs)
gen_kwargs = {
"max_length": self._max_length if self._max_length is not None else self.model.config.max_length,
"num_beams": self._num_beams if self._num_beams is not None else self.model.config.num_beams,
}
generated_tokens = self.model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
**gen_kwargs,
)
# in case the batch is shorter than max length, the output should be padded
if generated_tokens.shape[-1] < gen_kwargs["max_length"]:
generated_tokens = self._pad_tensors_to_max_len(generated_tokens, gen_kwargs["max_length"])
with torch.no_grad():
if self.use_amp:
with autocast():
outputs = model(**inputs)
else:
outputs = model(**inputs)
if has_labels:
if self.label_smoother is not None:
loss = self.label_smoother(outputs, inputs["labels"]).mean().detach()
else:
loss = (outputs["loss"] if isinstance(outputs, dict) else outputs[0]).mean().detach()
else:
loss = None
if self.args.prediction_loss_only:
return (loss, None, None)
labels = inputs["labels"]
if labels.shape[-1] < gen_kwargs["max_length"]:
labels = self._pad_tensors_to_max_len(labels, gen_kwargs["max_length"])
return (loss, generated_tokens, labels)

View File

@ -0,0 +1,274 @@
from typing import Dict, List, Optional
import numpy as np
import time
import torch
import collections
from packaging import version
from torch.utils.data.dataset import Dataset
from transformers import Trainer
from transformers import logging
from transformers.trainer_utils import (
speed_metrics,
EvalLoopOutput,
denumpify_detensorize
)
from transformers.file_utils import is_torch_tpu_available
from transformers.trainer_pt_utils import (
find_batch_size,
nested_numpify,
nested_truncate,
nested_concat,
IterableDatasetShard
)
from .trainer_utils import EvalPrediction
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import IterableDataset
from transformers.deepspeed import deepspeed_init
if version.parse(torch.__version__) >= version.parse("1.6"):
from torch.cuda.amp import autocast
if is_torch_tpu_available():
import torch_xla.core.xla_model as xm
import torch_xla.debug.metrics as met
import torch_xla.distributed.parallel_loader as pl
logger = logging.get_logger(__name__)
class BaseTrainer(Trainer):
def __init__(self, evaluation_metrics=[], data_info=None, *args, **kwargs):
"""When doing evaluation, it computes average of list of metrics
given in evaluation_metrics and adds it to the dictionary of results.
Trainer class then use this average metric to save the best model."""
super().__init__(*args, **kwargs)
self.evaluation_metrics = evaluation_metrics
self.data_info = data_info
def get_data_info(self, metric_key_prefix):
"""Returns the data information required to make the predictions/labels
suitable for the evaluation."""
if self.data_info is not None:
return self.data_info[metric_key_prefix]
return None
def evaluate(
self,
eval_dataset: Optional[Dataset] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = "eval",
) -> Dict[str, float]:
"""
Run evaluation and returns metrics.
The calling script will be responsible for providing a method to compute metrics, as they are task-dependent
(pass it to the init :obj:`compute_metrics` argument).
You can also subclass and override this method to inject custom behavior.
Args:
eval_dataset (:obj:`Dataset`, `optional`):
Pass a dataset if you wish to override :obj:`self.eval_dataset`. If it is an :obj:`datasets.Dataset`,
columns not accepted by the ``model.forward()`` method are automatically removed. It must implement the
:obj:`__len__` method.
ignore_keys (:obj:`Lst[str]`, `optional`):
A list of keys in the output of your model (if it is a dictionary) that should be ignored when
gathering predictions.
metric_key_prefix (:obj:`str`, `optional`, defaults to :obj:`"eval"`):
An optional prefix to be used as the metrics key prefix. For example the metrics "bleu" will be named
"eval_bleu" if the prefix is "eval" (default)
Returns:
A dictionary containing the evaluation loss and the potential metrics computed from the predictions. The
dictionary also contains the epoch number which comes from the training state.
"""
# memory metrics - must set up as early as possible
self._memory_tracker.start()
eval_dataloader = self.get_eval_dataloader(eval_dataset)
start_time = time.time()
eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
output = eval_loop(
eval_dataloader,
description="Evaluation",
# No point gathering the predictions if there are no metrics, otherwise we defer to
# self.args.prediction_loss_only
prediction_loss_only=True if self.compute_metrics is None else None,
ignore_keys=ignore_keys,
metric_key_prefix=metric_key_prefix,
)
output.metrics.update(speed_metrics(metric_key_prefix, start_time, output.num_samples))
if len(self.evaluation_metrics) != 0:
selected_metrics = [output.metrics[metric_key_prefix+"_"+k] for k in self.evaluation_metrics if metric_key_prefix+"_"+k in output.metrics]
assert len(selected_metrics) >= 1, "at least one metric should be selected to compute the average_metrics."
output.metrics.update({metric_key_prefix+'_average_metrics': np.mean(selected_metrics)})
self.log(output.metrics)
if self.args.tpu_metrics_debug or self.args.debug:
# tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
xm.master_print(met.metrics_report())
self.control = self.callback_handler.on_evaluate(self.args, self.state, self.control, output.metrics)
self._memory_tracker.stop_and_update_metrics(output.metrics)
return output.metrics
def evaluation_loop(
self,
dataloader: DataLoader,
description: str,
prediction_loss_only: Optional[bool] = None,
ignore_keys: Optional[List[str]] = None,
metric_key_prefix: str = "eval",
) -> EvalLoopOutput:
"""
Prediction/evaluation loop, shared by :obj:`Trainer.evaluate()` and :obj:`Trainer.predict()`.
Works both with or without labels.
"""
prediction_loss_only = (
prediction_loss_only if prediction_loss_only is not None else self.args.prediction_loss_only
)
# if eval is called w/o train init deepspeed here
if self.args.deepspeed and not self.deepspeed:
# XXX: eval doesn't have `resume_from_checkpoint` arg but we should be able to do eval
# from the checkpoint eventually
deepspeed_engine, _, _ = deepspeed_init(self, num_training_steps=0, resume_from_checkpoint=None)
self.model = deepspeed_engine.module
self.model_wrapped = deepspeed_engine
self.deepspeed = deepspeed_engine
# XXX: we don't need optim/sched for inference, but this needs to be sorted out, since
# for example the Z3-optimizer is a must for zero3 to work even for inference - what we
# don't need is the deepspeed basic optimizer which is self.optimizer.optimizer
deepspeed_engine.optimizer.optimizer = None
deepspeed_engine.lr_scheduler = None
model = self._wrap_model(self.model, training=False)
# if full fp16 is wanted on eval and this ``evaluation`` or ``predict`` isn't called while
# ``train`` is running, halve it first and then put on device
if not self.is_in_train and self.args.fp16_full_eval:
model = model.half().to(self.args.device)
batch_size = dataloader.batch_size
logger.info(f"***** Running {description} *****")
if isinstance(dataloader.dataset, collections.abc.Sized):
logger.info(f" Num examples = {self.num_examples(dataloader)}")
else:
logger.info(" Num examples: Unknown")
logger.info(f" Batch size = {batch_size}")
model.eval()
self.callback_handler.eval_dataloader = dataloader
# Do this before wrapping.
eval_dataset = dataloader.dataset
if is_torch_tpu_available():
dataloader = pl.ParallelLoader(dataloader, [self.args.device]).per_device_loader(self.args.device)
if self.args.past_index >= 0:
self._past = None
# Initialize containers
# losses/preds/labels on GPU/TPU (accumulated for eval_accumulation_steps)
losses_host = None
preds_host = None
labels_host = None
# losses/preds/labels on CPU (final containers)
all_losses = None
all_preds = None
all_labels = None
# Will be useful when we have an iterable dataset so don't know its length.
observed_num_examples = 0
# Main evaluation loop
for step, inputs in enumerate(dataloader):
# Update the observed num examples
observed_batch_size = find_batch_size(inputs)
if observed_batch_size is not None:
observed_num_examples += observed_batch_size
# Prediction step
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
# Update containers on host
if loss is not None:
losses = self._nested_gather(loss.repeat(batch_size))
losses_host = losses if losses_host is None else torch.cat((losses_host, losses), dim=0)
if logits is not None:
logits = self._pad_across_processes(logits)
logits = self._nested_gather(logits)
preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
if labels is not None:
labels = self._pad_across_processes(labels)
labels = self._nested_gather(labels)
labels_host = labels if labels_host is None else nested_concat(labels_host, labels, padding_index=-100)
self.control = self.callback_handler.on_prediction_step(self.args, self.state, self.control)
# Gather all tensors and put them back on the CPU if we have done enough accumulation steps.
if self.args.eval_accumulation_steps is not None and (step + 1) % self.args.eval_accumulation_steps == 0:
if losses_host is not None:
losses = nested_numpify(losses_host)
all_losses = losses if all_losses is None else np.concatenate((all_losses, losses), axis=0)
if preds_host is not None:
logits = nested_numpify(preds_host)
all_preds = logits if all_preds is None else nested_concat(all_preds, logits, padding_index=-100)
if labels_host is not None:
labels = nested_numpify(labels_host)
all_labels = (
labels if all_labels is None else nested_concat(all_labels, labels, padding_index=-100)
)
# Set back to None to begin a new accumulation
losses_host, preds_host, labels_host = None, None, None
if self.args.past_index and hasattr(self, "_past"):
# Clean the state at the end of the evaluation loop
delattr(self, "_past")
# Gather all remaining tensors and put them back on the CPU
if losses_host is not None:
losses = nested_numpify(losses_host)
all_losses = losses if all_losses is None else np.concatenate((all_losses, losses), axis=0)
if preds_host is not None:
logits = nested_numpify(preds_host)
all_preds = logits if all_preds is None else nested_concat(all_preds, logits, padding_index=-100)
if labels_host is not None:
labels = nested_numpify(labels_host)
all_labels = labels if all_labels is None else nested_concat(all_labels, labels, padding_index=-100)
# Number of samples
if not isinstance(eval_dataset, IterableDataset):
num_samples = len(eval_dataset)
elif isinstance(eval_dataset, IterableDatasetShard):
num_samples = eval_dataset.num_examples
else:
num_samples = observed_num_examples
# Number of losses has been rounded to a multiple of batch_size and in a distributed training, the number of
# samplers has been rounded to a multiple of batch_size, so we truncate.
if all_losses is not None:
all_losses = all_losses[:num_samples]
if all_preds is not None:
all_preds = nested_truncate(all_preds, num_samples)
if all_labels is not None:
all_labels = nested_truncate(all_labels, num_samples)
# Metrics!
if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels,
data_info=self.get_data_info(metric_key_prefix)))
else:
metrics = {}
# To be JSON-serializable, we need to remove numpy types or zero-d tensors
metrics = denumpify_detensorize(metrics)
if all_losses is not None:
metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
# Prefix all keys with metric_key_prefix + '_'
for key in list(metrics.keys()):
if not key.startswith(f"{metric_key_prefix}_"):
metrics[f"{metric_key_prefix}_{key}"] = metrics.pop(key)
return EvalLoopOutput(predictions=all_preds, label_ids=all_labels, metrics=metrics, num_samples=num_samples)

View File

@ -0,0 +1,140 @@
from dataclasses import dataclass, field
from typing import Optional, List
from transformers import Seq2SeqTrainingArguments
# run_seq2seq parameters.
@dataclass
class TrainingArguments(Seq2SeqTrainingArguments):
print_num_parameters: Optional[bool] = field(default=False, metadata={"help": "If set, print the parameters of "
"the model."})
do_test: Optional[bool] = field(default=False, metadata={"help": "If set, evaluates the test performance."})
split_validation_test: Optional[bool] = field(default=False,
metadata={"help": "If set, for the datasets which do not"
"have the test set, we use validation set as their"
"test set and make a validation set from either"
"splitting the validation set into half (for smaller"
"than 10K samples datasets), or by using 1K examples"
"from training set as validation set (for larger"
" datasets)."})
compute_time: Optional[bool] = field(default=False, metadata={"help": "If set measures the time."})
compute_memory: Optional[bool] = field(default=False, metadata={"help": "if set, measures the memory"})
# prefix_length: Optional[int] = field(default=100, metadata={"help": "Defines the length for prefix tuning."})
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
task_name: Optional[str] = field(
default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
)
dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
)
eval_dataset_name: Optional[str] = field(
default=None, metadata={"help": "The name of the evaluation dataset to use (via the datasets library)."}
)
eval_dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the evaluation dataset to use (via the datasets library)."}
)
test_dataset_name: Optional[str] = field(
default=None, metadata={"help": "The name of the test dataset to use (via the datasets library)."}
)
test_dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the test dataset to use (via the datasets library)."}
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
)
preprocessing_num_workers: Optional[int] = field(
default=None,
metadata={"help": "The number of processes to use for the preprocessing."},
)
max_source_length: Optional[int] = field(
default=128,
metadata={
"help": "The maximum total input sequence length after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded."
},
)
max_target_length: Optional[int] = field(
default=128,
metadata={
"help": "The maximum total sequence length for target text after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded."
},
)
val_max_target_length: Optional[int] = field(
default=None,
metadata={
"help": "The maximum total sequence length for validation target text after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded. Will default to `max_target_length`."
"This argument is also used to override the ``max_length`` param of ``model.generate``, which is used "
"during ``evaluate`` and ``predict``."
},
)
test_max_target_length: Optional[int] = field(
default=None,
metadata={
"help": "The maximum total sequence length for test target text after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded. Will default to `max_target_length`."
"This argument is also used to override the ``max_length`` param of ``model.generate``, which is used "
"during ``evaluate`` and ``predict``."
},
)
pad_to_max_length: bool = field(
default=False,
metadata={
"help": "Whether to pad all samples to model maximum sentence length. "
"If False, will pad the samples dynamically when batching to the maximum length in the batch. More "
"efficient on GPU but very bad for TPU."
},
)
max_train_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
},
)
max_val_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of validation examples to this "
"value if set."
},
)
max_test_samples: Optional[int] = field(
default=None,
metadata={"help": "For debugging purposes or quicker training, truncate the number of test examples to this "
"value if set."}
)
num_beams: Optional[int] = field(default=None, metadata={"help": "Number of beams to use for evaluation."})
ignore_pad_token_for_loss: bool = field(
default=True,
metadata={
"help": "Whether to ignore the tokens corresponding to padded labels in the loss computation or not."
},
)
task_adapters: Optional[List[str]] = field(
default=None,
metadata={"help": "Defines a dictionary from task adapters to the tasks."}
)
task_embeddings: Optional[List[str]] = field(
default=None,
metadata={"help": "Defines a dictionary from tasks to the tasks embeddings."}
)
data_seed: Optional[int] = field(default=42, metadata={"help": "seed used to shuffle the data."})
model_parallel: Optional[bool] = field(default=False, metadata={"help": "whether apply model parallelization"})
def __post_init__(self):
if self.task_name is None:
raise ValueError("Need either a dataset name or a training/validation file.")
if self.val_max_target_length is None:
self.val_max_target_length = self.max_target_length
if self.test_max_target_length is None:
self.test_max_target_length = self.max_target_length

View File

@ -0,0 +1,75 @@
import numpy as np
from typing import Union, NamedTuple, Tuple, Dict, Any
import os
import regex as re
import logging
from dataclasses import fields
import torch.nn as nn
import json
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
class EvalPrediction(NamedTuple):
"""
Evaluation output (always contains labels), to be used to compute metrics.
Parameters:
predictions (:obj:`np.ndarray`): Predictions of the model.
label_ids (:obj:`np.ndarray`): Targets to be matched.
data_info: (:obj:`Dict[str, Any]`): Extra dataset information, one requires
to performs the evaluation. The data_info is a dictionary with keys from
train, eval, test to specify the data_info for each split of the dataset.
"""
predictions: Union[np.ndarray, Tuple[np.ndarray]]
label_ids: np.ndarray
data_info: Dict[str, Any]
def create_dir(output_dir):
"""
Checks whether to the output_dir already exists and creates it if not.
Args:
output_dir: path to the output_dir
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
def get_last_checkpoint(output_dir):
if os.path.exists(os.path.join(output_dir, 'pytorch_model.bin')):
return output_dir
return None
def pad_punctuation(text):
"""Re-implementation of _pad_punctuation in t5. This function adds spaces
around punctuation. While this pads punctuation as expected, it has the
unexpected effected of padding certain unicode characters with accents, with
spaces as well. For instance: "François" becomes "Fran ç ois"""
# Pad everything except for: underscores (_), whitespace (\s),
# numbers (\p{N}), letters (\p{L}) and accent characters (\p{M}).
text = re.sub(r'([^_\s\p{N}\p{L}\p{M}])', r' \1 ', text)
# Collapse consecutive whitespace into one space.
text = re.sub(r'\s+', ' ', text)
return text
def save_json(filepath, dictionary):
with open(filepath, "w") as outfile:
json.dump(dictionary, outfile)
def read_json(filepath):
f = open(filepath,)
return json.load(f)
def save_training_config(config_file, output_dir):
json_data = read_json(config_file)
save_json(os.path.join(output_dir, "training_config.json"), json_data)

View File

@ -0,0 +1,15 @@
import os
import regex as re
import logging
from dataclasses import fields
import torch.nn as nn
import json
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

View File

@ -0,0 +1,58 @@
# Text classification with OpenDelta
This repository contains the examples that uses OpenDelta to do text-classification in a traditional classification mode, i.e., with a classification head on top of the language model. Almost all of the training pipeline codes remain the same, except for some minimum changes to insert delta models onto the backbone model.
## Generating the json configuration file
```
python config_gen.py --job $job_name
```
The available job configuration (e.g., `--job lora_roberta-base`) can be seen from `config_gen.py`. You can also
create your only configuration.
## Run the code
```
python run_glue.py configs/$job_name/$dataset.json
```
## Possible Errors
1.
```
ValueError: You must login to the Hugging Face hub on this computer by typing `transformers-cli login` and entering your credentials to use `use_auth_token=Tr
ue`. Alternatively, you can pass your own token as the `use_auth_token` argument.
```
- Solution 1: Please register an account on [HuggingFace](https://huggingface.co/)
Then run transformers-cli login on your command line to enter the username and password.
- Solution 2: Disable push_to_hub by modifying in the config.json : "push_to_hub": False
2.
```
OSError: Looks like you do not have git-lfs installed, please install. You can install from https://git-lfs.github.com/. Then run `git lfs install` (you only have to do this once).
```
- Solution 1:
```
wget -P ~ https://github.com/git-lfs/git-lfs/releases/download/v3.0.2/git-lfs-linux-amd64-v3.0.2.tar.gz
cd ~
tar -xvzf git-lfs-linux-amd64-v3.0.2.tar.gz
export PATH=~:$PATH # a temperary fix. To permantly add, modify your bash
git-lfs install
```
- Solution 2: Disable push_to_hub by modifying in the config.json : "push_to_hub": False
3. dataset connection error
Solution 1: open a python console, running the error command again, may not be useful
Solution 2: download the dataset by yourself on a internect connected machine, saved to disk and transfer to your server, at last load_from_disk.
## Link to the original training scripts
This example repo is based on the [huggingface text-classification example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification). Thanks to the authors of the original repo.

View File

@ -0,0 +1,342 @@
import collections
import copy
AllConfigs = {}
BaseConfigs = {}
BaseConfigs['roberta-base'] = {
("job_name", "task_name", "eval_dataset_name", "test_dataset_name", "num_train_epochs",
"max_source_length",
"per_device_train_batch_size", "per_device_eval_batch_size", "warmup_steps","save_steps", "eval_steps", "metric_for_best_model"): zip(
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record",
"superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
[ 20, 20, 40, 20, 3, 3, 20, 20, 20, 3, 3, 20, 3, 3, 20],
[256, 256, 256, 256, 256, 512, 256, 128, 128, 128, 128, 128, 128, 128, 128],
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[0] *7 +[0] *8,
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
["eval_accuracy"] *15,
),
"do_train": True,
"do_eval": True,
"do_test": True,
"model_name_or_path": "roberta-base",
"tokenizer_name": "roberta-base",
"save_total_limit": 1,
# For glue datasets.
# "split_validation_test": True,
"seed": 42,
"dataset_config_name": ["en"],
"eval_dataset_config_name": ["en"],
"test_dataset_config_name": ["en"],
# other configurations.
"predict_with_generate": True,
# To evaluate during training.
"load_best_model_at_end": True,
# "metric_for_best_model": "average_metrics",
"greater_is_better": True,
"evaluation_strategy": "steps",
"overwrite_output_dir": True,
"push_to_hub": True,
"save_strategy": "steps"
}
BaseConfigs['deberta-base'] = {
("job_name", "task_name", "eval_dataset_name", "test_dataset_name", "num_train_epochs",
"max_source_length",
"per_device_train_batch_size", "per_device_eval_batch_size", "warmup_steps","save_steps", "eval_steps", "metric_for_best_model"): zip(
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record",
"superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
[ 20, 20, 40, 20, 3, 3, 20, 20, 20, 3, 3, 20, 3, 3, 20],
[256, 256, 256, 256, 256, 512, 256, 128, 128, 128, 128, 128, 128, 128, 128],
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[ 32, 32, 32, 32, 32, 16, 32] + [32] * 8,
[0] *7 +[0] *8,
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
["eval_accuracy"] *15,
),
"do_train": True,
"do_eval": True,
"do_test": True,
"model_name_or_path": "microsoft/deberta-v3-base",
"tokenizer_name": "microsoft/deberta-v3-base",
"save_total_limit": 1,
# For glue datasets.
# "split_validation_test": True,
"seed": 42,
"dataset_config_name": ["en"],
"eval_dataset_config_name": ["en"],
"test_dataset_config_name": ["en"],
# other configurations.
"predict_with_generate": True,
# To evaluate during training.
"load_best_model_at_end": True,
# "metric_for_best_model": "average_metrics",
"greater_is_better": True,
"evaluation_strategy": "steps",
"overwrite_output_dir": True,
"push_to_hub": True,
"save_strategy": "steps"
}
BaseConfigs['deberta-v2-xlarge'] = {
("job_name", "task_name", "eval_dataset_name", "test_dataset_name", "num_train_epochs",
"max_source_length",
"per_device_train_batch_size", "per_device_eval_batch_size", "warmup_steps","save_steps", "eval_steps", "metric_for_best_model", "gradient_accumulation_steps"): zip(
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record",
"superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
["superglue-boolq", "superglue-cb", "superglue-copa", "superglue-wic", "superglue-multirc", "superglue-record", "superglue-wsc.fixed", "mrpc", "cola", "sst2", "qnli", "rte", "mnli", "qqp", "stsb"],
[ 20, 20, 40, 20, 3, 3, 20, 20, 20, 3, 3, 20, 3, 3, 20],
[256, 256, 256, 256, 256, 512, 256, 128, 128, 128, 128, 128, 128, 128, 128],
[ 16, 16, 16, 16, 16, 8, 16] + [16] * 8,
[ 16, 16, 16, 16, 16, 8, 16] + [16] * 8,
[0] *7 +[0] *8,
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
[200, 100, 50, 100, 200, 200, 100, 200, 100, 200, 200, 100, 200, 200, 100],
["eval_accuracy"] *15,
[4] *15,
),
"do_train": True,
"do_eval": True,
"do_test": True,
"model_name_or_path": "microsoft/deberta-v2-xlarge",
"tokenizer_name": "microsoft/deberta-v2-xlarge",
"save_total_limit": 1,
# For glue datasets.
# "split_validation_test": True,
"seed": 42,
"dataset_config_name": ["en"],
"eval_dataset_config_name": ["en"],
"test_dataset_config_name": ["en"],
# other configurations.
"predict_with_generate": True,
# To evaluate during training.
"load_best_model_at_end": True,
# "metric_for_best_model": "average_metrics",
"greater_is_better": True,
"evaluation_strategy": "steps",
"overwrite_output_dir": True,
"push_to_hub": True,
"save_strategy": "steps"
}
AllConfigs['bitfit_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['bitfit_roberta-base'].update({
"delta_type": "bitfit",
"learning_rate": 3e-4,
"output_dir": "outputs/bitfit/roberta-base/",
"unfrozen_modules": [
"classifier",
"deltas"
],
})
AllConfigs['adapter_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['adapter_roberta-base'].update({
"delta_type": "adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"bottleneck_dim":24,
"output_dir": "outputs/adapter/roberta-base/",
})
AllConfigs['lora_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['lora_roberta-base'].update({
"delta_type": "lora",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"lora_r": 8,
"output_dir": "outputs/lora/roberta-base/",
})
AllConfigs['compacter_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['compacter_roberta-base'].update({
"delta_type": "compacter",
"learning_rate": 3e-3,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"output_dir": "outputs/compacter/roberta-base/",
"non_linearity": "gelu_new",
#Compacter.
"hypercomplex_division": 4,
"hypercomplex_adapters": True,
"hypercomplex_nonlinearity": "glorot-uniform",
# gradient clip and clamp
"gradient_clip": False,
"phm_clamp": False,
"normalize_phm_weight": False,
"learn_phm": True,
# shared one side
"factorized_phm": True,
"shared_phm_rule": False,
"factorized_phm_rule": False,
"phm_c_init": "normal",
"phm_init_range": 0.0001,
"use_bias_down_sampler": True,
"use_bias_up_sampler": True,
})
AllConfigs['compacter++_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['compacter++_roberta-base'].update({
"delta_type": "compacter",
"learning_rate": 3e-3,
"do_train": True,
"do_eval": True,
"do_test": True,
"modified_modules": [
"DenseReluDense"
],
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"output_dir": "outputs/compacter++/roberta-base/",
"non_linearity": "gelu_new",
#Compacter.
"hypercomplex_division": 4,
"hypercomplex_adapters": True,
"hypercomplex_nonlinearity": "glorot-uniform",
# gradient clip and clamp
"gradient_clip": False,
"phm_clamp": False,
"normalize_phm_weight": False,
"learn_phm": True,
# shared one side
"factorized_phm": True,
"shared_phm_rule": False,
"factorized_phm_rule": False,
"phm_c_init": "normal",
"phm_init_range": 0.0001,
"use_bias_down_sampler": True,
"use_bias_up_sampler": True,
})
AllConfigs['low_rank_adapter_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['low_rank_adapter_roberta-base'].update({
"delta_type": "low_rank_adapter",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"layer_norm",
"final_layer_norm",
"classifier",
],
"output_dir": "outputs/low_rank_adapter/roberta-base/",
"non_linearity": "gelu_new",
"low_rank_w_init": "glorot-uniform",
"low_rank_rank": 1,
})
AllConfigs['soft_prompt_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['soft_prompt_roberta-base'].update({
"delta_type": "soft_prompt",
"learning_rate": 3e-2,
"soft_token_num":100,
"unfrozen_modules": [
"deltas",
"classifier",
],
"output_dir": "outputs/soft_prompt/roberta-base/",
})
AllConfigs['prefix_roberta-base'] = copy.deepcopy(BaseConfigs['roberta-base'])
AllConfigs['prefix_roberta-base'].update({
"delta_type": "prefix",
"learning_rate": 3e-4,
"unfrozen_modules": [
"deltas",
"classifier",
],
"output_dir": "outputs/prefix/roberta-base/",
})
AllConfigs['soft_prompt_deberta-v2-xlarge'] = copy.deepcopy(BaseConfigs['deberta-v2-xlarge'])
AllConfigs['soft_prompt_deberta-v2-xlarge'].update({
"delta_type": "soft_prompt",
"learning_rate": 3e-2,
"soft_token_num":100,
"unfrozen_modules": [
"deltas",
"classifier",
],
"output_dir": "outputs/soft_prompt/deberta-v2-xlarge/",
})
if __name__ == "__main__":
import argparse
import json
import os
parser = argparse.ArgumentParser("Parser to generate configuration")
parser.add_argument("--job", type=str)
args = parser.parse_args()
config = AllConfigs[args.job]
Cartesian_product = []
for key in config:
if isinstance(key, tuple):
Cartesian_product.append(key)
all_config_jsons = {}
for key_tuple in Cartesian_product:
for zipped in config[key_tuple]:
job_name = zipped[0]
all_config_jsons[job_name] = {}
for key_name, zipped_elem in zip(key_tuple, zipped):
if key_name != 'job_name':
all_config_jsons[job_name][key_name] = zipped_elem
for key in config:
if not isinstance(key, tuple):
for job_name in all_config_jsons:
if key == "output_dir":
all_config_jsons[job_name][key] = config[key] + job_name
else:
all_config_jsons[job_name][key] = config[key]
if not os.path.exists(f"./{args.job}/"):
os.mkdir(f"./{args.job}/")
for job_name in all_config_jsons:
with open(f"./{args.job}/{job_name}.json", 'w') as fout:
json.dump(all_config_jsons[job_name], fout, indent=4,sort_keys=True)

Some files were not shown because too many files have changed in this diff Show More