Learning Multilingual Sentence Representations with Cross-lingual
Consistency Regularization
- URL: http://arxiv.org/abs/2306.06919v1
- Date: Mon, 12 Jun 2023 07:39:06 GMT
- Title: Learning Multilingual Sentence Representations with Cross-lingual
Consistency Regularization
- Authors: Pengzhi Gao, Liwen Zhang, Zhongjun He, Hua Wu, Haifeng Wang
- Abstract summary: We introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports more than 220 languages.
We train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework.
Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach.
- Score: 46.09132547431629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual sentence representations are the foundation for similarity-based
bitext mining, which is crucial for scaling multilingual neural machine
translation (NMT) system to more languages. In this paper, we introduce MuSR: a
one-for-all Multilingual Sentence Representation model that supports more than
220 languages. Leveraging billions of English-centric parallel corpora, we
train a multilingual Transformer encoder, coupled with an auxiliary Transformer
decoder, by adopting a multilingual NMT framework with CrossConST, a
cross-lingual consistency regularization technique proposed in Gao et al.
(2023). Experimental results on multilingual similarity search and bitext
mining tasks show the effectiveness of our approach. Specifically, MuSR
achieves superior performance over LASER3 (Heffernan et al., 2022) which
consists of 148 independent multilingual sentence encoders.
Related papers
- m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt [39.2728779674405]
We propose a framework to leverage the multimodal prompt to guide the Multimodal Multilingual neural Machine Translation (m3P)
Our method aims to minimize the representation distance of different languages by regarding the image as a central language.
Experimental results show that m3P outperforms previous text-only baselines and multilingual multimodal methods by a large margin.
arXiv Detail & Related papers (2024-03-26T10:04:24Z) - Towards Boosting Many-to-Many Multilingual Machine Translation with
Large Language Models [47.39529535727593]
This paper focuses on boosting many-to-many multilingual translation of large language models (LLMs) with an emphasis on zero-shot translation directions.
We introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages.
Experimental results on ALMA, Tower, and LLaMA-2 show that our approach consistently improves translation performance.
arXiv Detail & Related papers (2024-01-11T12:11:30Z) - Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation [18.97234151624098]
We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, we achieve significantly better cross-lingual task knowledge transfer.
We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl.
arXiv Detail & Related papers (2023-06-01T15:19:06Z) - Towards the Next 1000 Languages in Multilingual Machine Translation:
Exploring the Synergy Between Supervised and Self-Supervised Learning [48.15259834021655]
We present a pragmatic approach towards building a multilingual machine translation model that covers hundreds of languages.
We use a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs.
We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting.
arXiv Detail & Related papers (2022-01-09T23:36:44Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.