mT6: Multilingual Pretrained Text-to-Text Transformer with Translation
Pairs
- URL: http://arxiv.org/abs/2104.08692v1
- Date: Sun, 18 Apr 2021 03:24:07 GMT
- Title: mT6: Multilingual Pretrained Text-to-Text Transformer with Translation
Pairs
- Authors: Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan
Huang, Furu Wei
- Abstract summary: We improve multilingual text-to-text transfer Transformer with translation pairs (mT6)
We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption.
Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
- Score: 51.67970832510462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive
monolingual texts, which has shown promising results on many cross-lingual
tasks. In this paper, we improve multilingual text-to-text transfer Transformer
with translation pairs (mT6). Specifically, we explore three cross-lingual
text-to-text pre-training tasks, namely, machine translation, translation pair
span corruption, and translation span corruption. In addition, we propose a
partially non-autoregressive objective for text-to-text pre-training. We
evaluate the methods on seven multilingual benchmark datasets, including
sentence classification, named entity recognition, question answering, and
abstractive summarization. Experimental results show that the proposed mT6
improves cross-lingual transferability over mT5.
Related papers
- Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language.
In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems.
We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - mmT5: Modular Multilingual Pre-Training Solves Source Language
Hallucinations [54.42422445568523]
mmT5 is a modular multilingual sequence-to-sequence model.
It disentangles language-specific information from language-agnostic information.
Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%.
arXiv Detail & Related papers (2023-05-23T16:38:01Z) - mLongT5: A Multilingual and Efficient Text-To-Text Transformer for
Longer Sequences [17.461172187276734]
This model builds upon the architecture of LongT5, while leveraging the multilingual datasets used for pretraining mT5 and the pretraining tasks of UL2.
We evaluate this model on a variety of multilingual summarization and question-answering tasks, and the results show stronger performance for mLongT5 when compared to existing multilingual models such as mBART or M-BERT.
arXiv Detail & Related papers (2023-05-18T17:22:53Z) - Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing [3.431659287330068]
We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset.
We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
arXiv Detail & Related papers (2022-12-14T13:48:32Z) - nmT5 -- Is parallel data still relevant for pre-training massively
multilingual language models? [9.560948239388662]
We investigate the impact of incorporating parallel data into mT5 pre-training.
We find that multi-tasking language modeling with objectives such as machine translation is a straightforward way to improve performance.
arXiv Detail & Related papers (2021-06-03T23:12:27Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.