Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks
- URL: http://arxiv.org/abs/2310.16897v1
- Date: Wed, 25 Oct 2023 18:00:15 GMT
- Title: Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks
- Authors: Solveig Helland, Elena Gavagnin, Alexandre de Spindler
- Abstract summary: We present an approach in which complex tasks are divided into simpler subtasks.
Multiple transformer models are fine-tuned to one subtask each, and lined up to accomplish the complex task.
This simplifies the compilation of fine-tuning datasets and increases overall controllability.
- Score: 44.99833362998488
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The growing capabilities of transformer models pave the way for solving
increasingly complex NLP tasks. A key to supporting application-specific
requirements is the ability to fine-tune. However, compiling a fine-tuning
dataset tailored to complex tasks is tedious and results in large datasets,
limiting the ability to control transformer output. We present an approach in
which complex tasks are divided into simpler subtasks. Multiple transformer
models are fine-tuned to one subtask each, and lined up to accomplish the
complex task. This simplifies the compilation of fine-tuning datasets and
increases overall controllability. Using the example of reducing gender bias as
a complex task, we demonstrate our approach and show that it performs better
than using a single model.
Related papers
- Context-Scaling versus Task-Scaling in In-Context Learning [17.36757113301424]
We analyze two key components of In-Context Learning (ICL): context-scaling and task-scaling.
While transformers are capable of both context-scaling and task-scaling, we empirically show that standard Multi-Layer Perceptrons (MLPs) with vectorized input are only capable of task-scaling.
arXiv Detail & Related papers (2024-10-16T17:58:08Z) - Sampling Foundational Transformer: A Theoretical Perspective [12.7600763629179]
We propose Foundational Sampling Transformer (SFT) that can work on multiple data modalities.
SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.
arXiv Detail & Related papers (2024-08-11T16:53:09Z) - Adaptivity and Modularity for Efficient Generalization Over Task
Complexity [42.748898521364914]
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential steps.
We propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers.
arXiv Detail & Related papers (2023-10-13T05:29:09Z) - Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Consolidator: Mergeable Adapter with Grouped Connections for Visual
Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer.
We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters.
Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - PolyViT: Co-training Vision Transformers on Images, Videos and Audio [80.0913507142036]
We present PolyViT, a model trained on image, audio and video.
By co-training different tasks on a single modality, we are able to improve the accuracy of each individual task.
We show that co-training is simple and practical to implement.
arXiv Detail & Related papers (2021-11-25T10:01:05Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.