diff --git a/docs/source/notes/explored_config.md b/docs/source/notes/explored_config.md index f5d9052..3855376 100644 --- a/docs/source/notes/explored_config.md +++ b/docs/source/notes/explored_config.md @@ -1,7 +1,7 @@ (favoredconfiguration)= # Favored Configuration -Generally, the default configurations are already good enough. If we want squeeze the size of delta models further, you can refer to the following papers. +Generally, the default configurations are already good enough. If you want squeeze the size of delta models further, you can refer to the following papers. - [AdapterDrop: On the Efficiency of Adapters in Transformers](https://arxiv.org/abs/2010.11918) - [Sparse Structure Search for Parameter-Efficient Tuning(Delta Tuning)](https://arxiv.org/abs/2206.07382) \ No newline at end of file diff --git a/docs/source/notes/faq.md b/docs/source/notes/faq.md index b7e4363..056399e 100644 --- a/docs/source/notes/faq.md +++ b/docs/source/notes/faq.md @@ -1,3 +1,9 @@ -# FAQ +# FAQs -1. +1. **Why I encounder NotImplementedError in Prefix Tuning?** + + This is because we find no easy way to get a unified Prefix Tuning implementation for different attention classes. If you really want to use Prefix Tuning for the models we have not supported, you can implement the ``PrefixLayerYOURMODEL`` on your own or raise a issue to request the feature for your model. + +2. **Available Models with default configurations are ..., Please manually add the delta models by speicifying 'modified_modules' based on the visualization of your model structure** + + Although most pre-trained models (PTMs) use the transformers archtecture, they are implemented differently. For example, the attention module in GPT2 and BERT is not only named differently, but also implemented in different ways. Common structure mapping mapps the different name conventions of different PTMs into a unified name convention. But there are many PTMs that we do not currently cover. But don't worry! For these models, you can figure out which modules should you modify by simply [visualizing the PTMs](visualization), and then specify the `modified modules` manually (See [name-based addressing](namebasedaddr)). diff --git a/docs/source/notes/update.md b/docs/source/notes/update.md index 7563a3e..2e642f7 100644 --- a/docs/source/notes/update.md +++ b/docs/source/notes/update.md @@ -22,4 +22,4 @@ ## Version 0.2.4 ### Updates - examples/examples_seq2seq and examples/examples_text-classification is depreciated and moved to [legacy](https://github.com/thunlp/OpenDelta/tree/main/examples/legacies) -- we provide [examples_prompt](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt), as a cleaner and more general framework, which unifies the delta tuning paradigm and the prompt-tuning paradigm. It is still based on [Huggingface Trainers](https://huggingface.co/docs/transformers/main_classes/trainer). In these examples, the core pipeline is [using unified scripts](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/src), the difference in tasks, models, delta tuning models, and even prompt-tuning paradigms are [seperated and be more indepent](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/backbones). Please try it out! \ No newline at end of file +- we provide [examples_prompt](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt), as a cleaner and more general framework, which unifies the delta tuning paradigm and the prompt-tuning paradigm. It is still based on [Huggingface Trainers](https://huggingface.co/docs/transformers/main_classes/trainer). In this example framework, the running pipeline is [a unified script](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/src), the differences in tasks, models, delta tuning models, and even prompt-tuning paradigms are [more modular and be more independent ](https://github.com/thunlp/OpenDelta/tree/main/examples/examples_prompt/backbones). Please try it out! \ No newline at end of file