Related papers: Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization

Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization

URL: http://arxiv.org/abs/2306.06919v1
Date: Mon, 12 Jun 2023 07:39:06 GMT
Title: Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Authors: Pengzhi Gao, Liwen Zhang, Zhongjun He, Hua Wu, Haifeng Wang
Abstract summary: We introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports more than 220 languages. We train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework. Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach.
Score: 46.09132547431629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multilingual sentence representations are the foundation for similarity-based bitext mining, which is crucial for scaling multilingual neural machine translation (NMT) system to more languages. In this paper, we introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports more than 220 languages. Leveraging billions of English-centric parallel corpora, we train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework with CrossConST, a cross-lingual consistency regularization technique proposed in Gao et al. (2023). Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach. Specifically, MuSR achieves superior performance over LASER3 (Heffernan et al., 2022) which consists of 148 independent multilingual sentence encoders.

Related papers

m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt [39.2728779674405]
We propose a framework to leverage the multimodal prompt to guide the Multimodal Multilingual neural Machine Translation (m3P) Our method aims to minimize the representation distance of different languages by regarding the image as a central language. Experimental results show that m3P outperforms previous text-only baselines and multilingual multimodal methods by a large margin.
arXiv Detail & Related papers (2024-03-26T10:04:24Z)
Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models [47.39529535727593]
This paper focuses on boosting many-to-many multilingual translation of large language models (LLMs) with an emphasis on zero-shot translation directions. We introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages. Experimental results on ALMA, Tower, and LLaMA-2 show that our approach consistently improves translation performance.
arXiv Detail & Related papers (2024-01-11T12:11:30Z)
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation [18.97234151624098]
We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, we achieve significantly better cross-lingual task knowledge transfer. We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl.
arXiv Detail & Related papers (2023-06-01T15:19:06Z)
Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning [48.15259834021655]
We present a pragmatic approach towards building a multilingual machine translation model that covers hundreds of languages. We use a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs. We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting.
arXiv Detail & Related papers (2022-01-09T23:36:44Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.