Challenges and Opportunities of Using Transformer-Based Multi-Task
Learning in NLP Through ML Lifecycle: A Survey
- URL: http://arxiv.org/abs/2308.08234v1
- Date: Wed, 16 Aug 2023 09:11:00 GMT
- Title: Challenges and Opportunities of Using Transformer-Based Multi-Task
Learning in NLP Through ML Lifecycle: A Survey
- Authors: Lovre Torbarina, Tin Ferkovic, Lukasz Roguski, Velimir Mihelcic, Bruno
Sarlija, Zeljko Kraljevic
- Abstract summary: Multi-Task Learning (MTL) has emerged as a promising approach to improve efficiency and performance through joint training.
We discuss the challenges and opportunities of using MTL approaches throughout typical machine learning lifecycle phases.
We believe it would be practical to have a model that can handle both MTL and continual learning.
- Score: 0.6240603866868214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing adoption of natural language processing (NLP) models across
industries has led to practitioners' need for machine learning systems to
handle these models efficiently, from training to serving them in production.
However, training, deploying, and updating multiple models can be complex,
costly, and time-consuming, mainly when using transformer-based pre-trained
language models. Multi-Task Learning (MTL) has emerged as a promising approach
to improve efficiency and performance through joint training, rather than
training separate models. Motivated by this, we first provide an overview of
transformer-based MTL approaches in NLP. Then, we discuss the challenges and
opportunities of using MTL approaches throughout typical ML lifecycle phases,
specifically focusing on the challenges related to data engineering, model
development, deployment, and monitoring phases. This survey focuses on
transformer-based MTL architectures and, to the best of our knowledge, is novel
in that it systematically analyses how transformer-based MTL in NLP fits into
ML lifecycle phases. Furthermore, we motivate research on the connection
between MTL and continual learning (CL), as this area remains unexplored. We
believe it would be practical to have a model that can handle both MTL and CL,
as this would make it easier to periodically re-train the model, update it due
to distribution shifts, and add new capabilities to meet real-world
requirements.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data [25.45278447786954]
We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-FL)
Our framework is adept at harnessing the extensive, yet previously underexploited, open-source data accessible from websites and powerful server-side computational resources.
arXiv Detail & Related papers (2024-09-09T21:04:16Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources.
Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z) - Federated Transfer Learning with Task Personalization for Condition Monitoring in Ultrasonic Metal Welding [3.079885946230076]
This paper presents a Transfer Learning with.
Federated Task Task architecture (FTLTP) that provides data capabilities in distributed distributed learning framework.
The FTL-TP framework is readily to various other manufacturing applications.
arXiv Detail & Related papers (2024-04-20T05:31:59Z) - How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes [6.652837942112205]
Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text.
We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence.
Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work.
arXiv Detail & Related papers (2024-04-04T16:15:23Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.