Task-Based MoE for Multitask Multilingual Machine Translation
- URL: http://arxiv.org/abs/2308.15772v3
- Date: Tue, 24 Oct 2023 22:59:26 GMT
- Title: Task-Based MoE for Multitask Multilingual Machine Translation
- Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff,
Barnabas Poczos, Hany Hassan Awadalla
- Abstract summary: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.
In this work, we design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters.
- Score: 58.20896429151824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for
diverse tasks in training deep models in many applications. However, current
MoE implementations are task agnostic, treating all tokens from different tasks
in the same manner. In this work, we instead design a novel method that
incorporates task information into MoE models at different granular levels with
shared dynamic task-based adapters. Our experiments and analysis show the
advantages of our approaches over the dense and canonical MoE models on
multi-task multilingual machine translations. With task-specific adapters, our
models can additionally generalize to new tasks efficiently.
Related papers
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model [11.885204227946549]
We propose a comprehensive model designed to represent various tasks using a unified representation.
Our model exhibits strong capabilities in comprehending the implicit intent of user instructions.
Our approach exhibits exceptional scalability and generality.
arXiv Detail & Related papers (2024-08-05T14:27:39Z) - Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond [16.913115978881866]
We propose a framework for unified task embeddings (FUTE), task embeddings from various models, including smaller language models and Large Language Models with varied prompts, within a single vector space.
Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios.
arXiv Detail & Related papers (2024-02-22T13:13:31Z) - Making Small Language Models Better Multi-task Learners with
Mixture-of-Task-Adapters [13.6682552098234]
Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks.
We present ALTER, a system that effectively builds the multi-tAsk learners with mixTure-of-task-adaptERs upon small language models.
A two-stage training method is proposed to optimize the collaboration between adapters at a small computational cost.
arXiv Detail & Related papers (2023-09-20T03:39:56Z) - JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for
Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving.
Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z) - Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model.
We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers.
We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.