Related papers: Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks

Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks

URL: http://arxiv.org/abs/2310.16897v1
Date: Wed, 25 Oct 2023 18:00:15 GMT
Title: Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks
Authors: Solveig Helland, Elena Gavagnin, Alexandre de Spindler
Abstract summary: We present an approach in which complex tasks are divided into simpler subtasks. Multiple transformer models are fine-tuned to one subtask each, and lined up to accomplish the complex task. This simplifies the compilation of fine-tuning datasets and increases overall controllability.
Score: 44.99833362998488
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The growing capabilities of transformer models pave the way for solving increasingly complex NLP tasks. A key to supporting application-specific requirements is the ability to fine-tune. However, compiling a fine-tuning dataset tailored to complex tasks is tedious and results in large datasets, limiting the ability to control transformer output. We present an approach in which complex tasks are divided into simpler subtasks. Multiple transformer models are fine-tuned to one subtask each, and lined up to accomplish the complex task. This simplifies the compilation of fine-tuning datasets and increases overall controllability. Using the example of reducing gender bias as a complex task, we demonstrate our approach and show that it performs better than using a single model.

Related papers

In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly [25.47694115798524]
In-context learning (ICL) enables transformers to adapt to new tasks through contextual examples without parameter updates.<n>This paper investigates how transformers navigate hierarchical task structures where higher-complexity categories can perfectly represent any pattern generated by simpler ones.
arXiv Detail & Related papers (2025-06-24T06:33:00Z)
Learning Compositional Functions with Transformers from Easy-to-Hard Data [63.96562216704653]
We study the learnability of the $k$-fold composition task, which requires computing an interleaved composition of $k$ input permutations and $k$ hidden permutations.<n>We show that this function class can be efficiently learned, with runtime and sample in $k$, by gradient descent on an $O(log k)$-depth transformer.
arXiv Detail & Related papers (2025-05-29T17:22:00Z)
xPerT: Extended Persistence Transformer [0.0]
A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. We propose a novel transformer architecture called the textitExtended Persistence Transformer (xPerT), which is highly scalable. xPerT reduces GPU memory usage by over 90% and improves accuracy on multiple datasets.
arXiv Detail & Related papers (2024-10-18T06:07:22Z)
Context-Scaling versus Task-Scaling in In-Context Learning [17.36757113301424]
We analyze two key components of In-Context Learning (ICL): context-scaling and task-scaling. While transformers are capable of both context-scaling and task-scaling, we empirically show that standard Multi-Layer Perceptrons (MLPs) with vectorized input are only capable of task-scaling.
arXiv Detail & Related papers (2024-10-16T17:58:08Z)
Sampling Foundational Transformer: A Theoretical Perspective [12.7600763629179]
We propose Foundational Sampling Transformer (SFT) that can work on multiple data modalities. SFT has achieved competitive results on many benchmarks, while being faster in inference, compared to other very specialized models.
arXiv Detail & Related papers (2024-08-11T16:53:09Z)
Adaptivity and Modularity for Efficient Generalization Over Task Complexity [42.748898521364914]
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential steps. We propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers.
arXiv Detail & Related papers (2023-10-13T05:29:09Z)
Vision Transformer Adapters for Generalizable Multitask Learning [61.79647180647685]
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities. Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
arXiv Detail & Related papers (2023-08-23T18:40:48Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer. We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters. Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z)
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal. We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z)
PolyViT: Co-training Vision Transformers on Images, Videos and Audio [80.0913507142036]
We present PolyViT, a model trained on image, audio and video. By co-training different tasks on a single modality, we are able to improve the accuracy of each individual task. We show that co-training is simple and practical to implement.
arXiv Detail & Related papers (2021-11-25T10:01:05Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.