Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text
Transfer Transformers
- URL: http://arxiv.org/abs/2209.10106v1
- Date: Wed, 21 Sep 2022 04:21:27 GMT
- Title: Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text
Transfer Transformers
- Authors: Adebayo Oshingbesan, Courage Ekoh, Germann Atakpa, Yonah Byaruagaba
- Abstract summary: We investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5)
We carried out experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning.
We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text-to-text transformers have shown remarkable success in the task of
multi-task transfer learning, especially in natural language processing (NLP).
However, while there have been several attempts to train transformers on
different domains, there is usually a clear relationship between these domains,
e.g.,, code summarization, where the natural language summary describes the
code. There have been very few attempts to study how multi-task transfer
learning works on tasks in significantly different domains. In this project, we
investigated the behavior of multi-domain, multi-task learning using
multi-domain text-to-text transfer transformers (MD-T5) on four tasks across
two domains - Python Code and Chess. We carried out extensive experiments using
three popular training strategies: Bert-style joint pretraining + successive
finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style
joint pretraining + joint finetuning. Also, we evaluate the model on four
metrics - Play Score, Eval Score, BLEU Score, and Multi-Domain Learning Score
(MDLS). These metrics measure performance across the various tasks and
multi-domain learning. We show that while negative knowledge transfer and
catastrophic forgetting are still considerable challenges for all the models,
the GPT-style joint pretraining + joint finetuning strategy showed the most
promise in multi-domain, multi-task learning as it performs well across all
four tasks while still keeping its multi-domain knowledge.
Related papers
- Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.
In stage 1, we extract task-specific features and client-specific features from visual information.
In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - On Domain-Specific Post-Training for Multimodal Large Language Models [72.67107077850939]
We develop a visual instruction synthesizer that generates diverse visual instruction tasks from domain-specific image-caption pairs.
We apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.
We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales.
arXiv Detail & Related papers (2024-11-29T18:42:28Z) - DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines [15.332562681746081]
This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.
We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling.
arXiv Detail & Related papers (2023-11-17T09:48:45Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Compositional Zero-Shot Domain Transfer with Text-to-Text Models [65.32821642379066]
We propose a novel compositional transfer learning framework (DoT5) for zero-shot domain transfer.
Without access to in-domain labels, DoT5 jointly learns domain knowledge and task knowledge in a multi-task manner.
DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning.
In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI.
arXiv Detail & Related papers (2023-03-23T15:58:41Z) - Domain Curricula for Code-Switched MT at MixMT 2022 [0.0]
We present our approach and results for the Code-mixed Machine Translation (MixMT) shared task at WMT 2022.
The task consists of two subtasks, monolingual to code-mixed machine translation (Subtask-1) and code-mixed to monolingual machine translation (Subtask-2).
We jointly learn multiple domains of text by pretraining and fine-tuning, combined with a sentence alignment objective.
arXiv Detail & Related papers (2022-10-31T16:41:57Z) - MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization [55.06956781674986]
We resort to solving the semi-supervised domain generalization task, where there are a few label information in each source domain.
We propose MultiMatch, extending FixMatch to the multi-task learning framework, producing the high-quality pseudo-label for SSDG.
A series of experiments validate the effectiveness of the proposed method, and it outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.
arXiv Detail & Related papers (2022-08-11T14:44:33Z) - GPPF: A General Perception Pre-training Framework via Sparsely Activated
Multi-Task Learning [23.15735672234869]
We propose GPPF, a General Perception Pre-training Framework, to pre-train a task-level dynamic network.
By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks.
We develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training.
arXiv Detail & Related papers (2022-08-03T15:34:35Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.