MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
- URL: http://arxiv.org/abs/2403.07544v1
- Date: Tue, 12 Mar 2024 11:32:30 GMT
- Title: MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki
- Authors: Timothee Mickus, Stig-Arne Gr\"onroos, Joseph Attieh, Michele Boggia,
Ona De Gibert, Shaoxiong Ji, Niki Andreas Lopi, Alessandro Raganato, Ra\'ul
V\'azquez, J\"org Tiedemann
- Abstract summary: We present the MAMMOTH toolkit, a framework for training massively multilingual modular machine translation systems at scale.
We showcase its efficiency across clusters of A100 and V100 NVIDIA GPUs, and discuss our design philosophy and plans for future information.
- Score: 46.62437145754009
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NLP in the age of monolithic large language models is approaching its limits
in terms of size and information that can be handled. The trend goes to
modularization, a necessary step into the direction of designing smaller
sub-networks and components with specialized functionality. In this paper, we
present the MAMMOTH toolkit: a framework designed for training massively
multilingual modular machine translation systems at scale, initially derived
from OpenNMT-py and then adapted to ensure efficient training across
computation clusters. We showcase its efficiency across clusters of A100 and
V100 NVIDIA GPUs, and discuss our design philosophy and plans for future
information. The toolkit is publicly available online.
Related papers
- CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for
Multimodal Machine Translation [31.911593690549633]
multimodal machine translation (MMT) systems enhance neural machine translation (NMT) with visual knowledge.
Previous works face a challenge in training powerful MMT models from scratch due to the scarcity of annotated multilingual vision-language data.
We propose CLIPTrans, which simply adapts the independently pre-trained multimodal M-CLIP and the multilingual mBART.
arXiv Detail & Related papers (2023-08-29T11:29:43Z) - ModuleFormer: Modularity Emerges from Mixture-of-Experts [60.6148988099284]
This paper proposes a new neural network architecture, ModuleFormer, to improve the efficiency and flexibility of large language models.
Unlike the previous SMoE-based modular language model, ModuleFormer can induce modularity from uncurated data.
arXiv Detail & Related papers (2023-06-07T17:59:57Z) - Otter: A Multi-Modal Model with In-Context Instruction Tuning [30.804061018682244]
We introduce instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset.
We then introduce Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning.
arXiv Detail & Related papers (2023-05-05T17:59:46Z) - Lego-MT: Learning Detachable Models for Massively Multilingual Machine
Translation [48.37939354609931]
We propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU.
The proposed training recipe brings a 28.2$times$ speedup over the conventional multi-way training method.
arXiv Detail & Related papers (2022-12-20T18:54:08Z) - Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters.
We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency.
A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.