mmT5: Modular Multilingual Pre-Training Solves Source Language
Hallucinations
- URL: http://arxiv.org/abs/2305.14224v1
- Date: Tue, 23 May 2023 16:38:01 GMT
- Title: mmT5: Modular Multilingual Pre-Training Solves Source Language
Hallucinations
- Authors: Jonas Pfeiffer, Francesco Piccinno, Massimo Nicosia, Xinyi Wang,
Machel Reid, Sebastian Ruder
- Abstract summary: mmT5 is a modular multilingual sequence-to-sequence model.
It disentangles language-specific information from language-agnostic information.
Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%.
- Score: 54.42422445568523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual sequence-to-sequence models perform poorly with increased
language coverage and fail to consistently generate text in the correct target
language in few-shot settings. To address these challenges, we propose mmT5, a
modular multilingual sequence-to-sequence model. mmT5 utilizes
language-specific modules during pre-training, which disentangle
language-specific information from language-agnostic information. We identify
representation drift during fine-tuning as a key limitation of modular
generative models and develop strategies that enable effective zero-shot
transfer. Our model outperforms mT5 at the same parameter sizes by a large
margin on representative natural language understanding and generation tasks in
40+ languages. Compared to mT5, mmT5 raises the rate of generating text in the
correct language under zero-shot settings from 7% to 99%, thereby greatly
alleviating the source language hallucination problem.
Related papers
- Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - idT5: Indonesian Version of Multilingual T5 Transformer [0.0]
Indonesian is spoken by almost 200 million people and is the 10th most spoken language in the world.
In this study, the mT5 model was adapted for only one language, Indonesian, resulting in a pre-trained T5 model that was specific only for Indonesian with a smaller size.
Fine-tuned model based on our model achieved 77.18% accuracy on SA, 8% higher than the mT5-based model, and obtained nearly the same score as the mT5-based model on QG and QA.
arXiv Detail & Related papers (2023-02-02T03:56:16Z) - Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing [3.431659287330068]
We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset.
We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
arXiv Detail & Related papers (2022-12-14T13:48:32Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - T5lephone: Bridging Speech and Text Self-supervised Models for Spoken
Language Understanding via Phoneme level T5 [65.32642587901903]
We conduct extensive studies on how PLMs with different tokenization strategies affect spoken language understanding task.
We extend the idea to create T5lephone, a variant of T5 that is pretrained using phonemicized text.
arXiv Detail & Related papers (2022-11-01T17:00:23Z) - Sequence to sequence pretraining for a less-resourced Slovenian language [0.0]
We train two different sized T5-type sequence to sequence models for morphologically rich Slovene language with much less resources and analyzed their behavior.
Concerning classification tasks, the SloT5 models mostly lag behind the monolingual Slovene SloBERTa model but are to be considered for the generative tasks.
arXiv Detail & Related papers (2022-07-28T10:08:50Z) - mT6: Multilingual Pretrained Text-to-Text Transformer with Translation
Pairs [51.67970832510462]
We improve multilingual text-to-text transfer Transformer with translation pairs (mT6)
We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption.
Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
arXiv Detail & Related papers (2021-04-18T03:24:07Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.