Related papers: Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers

Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers

URL: http://arxiv.org/abs/2209.10106v1
Date: Wed, 21 Sep 2022 04:21:27 GMT
Title: Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers
Authors: Adebayo Oshingbesan, Courage Ekoh, Germann Atakpa, Yonah Byaruagaba
Abstract summary: We investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5) We carried out experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning. We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Text-to-text transformers have shown remarkable success in the task of multi-task transfer learning, especially in natural language processing (NLP). However, while there have been several attempts to train transformers on different domains, there is usually a clear relationship between these domains, e.g.,, code summarization, where the natural language summary describes the code. There have been very few attempts to study how multi-task transfer learning works on tasks in significantly different domains. In this project, we investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5) on four tasks across two domains - Python Code and Chess. We carried out extensive experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning. Also, we evaluate the model on four metrics - Play Score, Eval Score, BLEU Score, and Multi-Domain Learning Score (MDLS). These metrics measure performance across the various tasks and multi-domain learning. We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning as it performs well across all four tasks while still keeping its multi-domain knowledge.

Related papers

Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM. In stage 1, we extract task-specific features and client-specific features from visual information. In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z)
On Domain-Specific Post-Training for Multimodal Large Language Models [72.67107077850939]
This paper systematically investigates domain adaptation of MLLMs through post-training. We focus on data synthesis, training pipelines, and task evaluation. We conduct experiments in high-impact domains such as biomedicine, food, and remote sensing.
arXiv Detail & Related papers (2024-11-29T18:42:28Z)
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning [55.107329995417786]
Large language models (LLMs) have demonstrated impressive general understanding and generation abilities. We establish a benchmark for multi-domain translation, featuring 25 German$Leftrightarrow$English and 22 Chinese$Leftrightarrow$English test sets. We propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance.
arXiv Detail & Related papers (2024-10-03T16:15:04Z)
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines [15.332562681746081]
This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling.
arXiv Detail & Related papers (2023-11-17T09:48:45Z)
TransPrompt v2: A Transferable Prompting Framework for Cross-task Text Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks. For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner. For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z)
Compositional Zero-Shot Domain Transfer with Text-to-Text Models [65.32821642379066]
We propose a novel compositional transfer learning framework (DoT5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge and task knowledge in a multi-task manner. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI.
arXiv Detail & Related papers (2023-03-23T15:58:41Z)
Domain Curricula for Code-Switched MT at MixMT 2022 [0.0]
We present our approach and results for the Code-mixed Machine Translation (MixMT) shared task at WMT 2022. The task consists of two subtasks, monolingual to code-mixed machine translation (Subtask-1) and code-mixed to monolingual machine translation (Subtask-2). We jointly learn multiple domains of text by pretraining and fine-tuning, combined with a sentence alignment objective.
arXiv Detail & Related papers (2022-10-31T16:41:57Z)
MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization [55.06956781674986]
We resort to solving the semi-supervised domain generalization task, where there are a few label information in each source domain. We propose MultiMatch, extending FixMatch to the multi-task learning framework, producing the high-quality pseudo-label for SSDG. A series of experiments validate the effectiveness of the proposed method, and it outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.
arXiv Detail & Related papers (2022-08-11T14:44:33Z)
GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning [23.15735672234869]
We propose GPPF, a General Perception Pre-training Framework, to pre-train a task-level dynamic network. By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks. We develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training.
arXiv Detail & Related papers (2022-08-03T15:34:35Z)
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue. FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer. We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z)
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning. CEN dynamically exchanges channels betweenworks of different modalities. For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z)
Disentangling Transfer and Interference in Multi-Domain Learning [53.34444188552444]
We study the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2021-07-02T01:30:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.