Related papers: Task-Based MoE for Multitask Multilingual Machine Translation

Task-Based MoE for Multitask Multilingual Machine Translation

URL: http://arxiv.org/abs/2308.15772v3
Date: Tue, 24 Oct 2023 22:59:26 GMT
Title: Task-Based MoE for Multitask Multilingual Machine Translation
Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla
Abstract summary: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. In this work, we design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters.
Score: 58.20896429151824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.

Related papers

UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model [11.885204227946549]
We propose a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions. Our approach exhibits exceptional scalability and generality.
arXiv Detail & Related papers (2024-08-05T14:27:39Z)
Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond [16.913115978881866]
We propose a framework for unified task embeddings (FUTE), task embeddings from various models, including smaller language models and Large Language Models with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios.
arXiv Detail & Related papers (2024-02-22T13:13:31Z)
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters [13.6682552098234]
Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks. We present ALTER, a system that effectively builds the multi-tAsk learners with mixTure-of-task-adaptERs upon small language models. A two-stage training method is proposed to optimize the collaboration between adapters at a small computational cost.
arXiv Detail & Related papers (2023-09-20T03:39:56Z)
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z)
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model. We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z)
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning. We devise task-aware gating functions to route examples from different tasks to specialized experts. This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z)
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation [80.18830380517753]
We develop a new task-agnostic distillation framework XtremeDistilTransformers. We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
arXiv Detail & Related papers (2021-06-08T17:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.