Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation
- URL: http://arxiv.org/abs/2402.01772v1
- Date: Thu, 1 Feb 2024 10:55:03 GMT
- Title: Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation
- Authors: Yan Meng and Christof Monz
- Abstract summary: We conduct a large-scale study that varies the auxiliary target side languages along two dimensions.
We show that linguistically similar target languages exhibit strong ability to transfer positive knowledge.
With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs.
Meanwhile, distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability.
- Score: 9.838281446902268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual Machine Translation (MMT) benefits from knowledge transfer
across different language pairs. However, improvements in one-to-many
translation compared to many-to-one translation are only marginal and sometimes
even negligible. This performance discrepancy raises the question of to what
extent positive transfer plays a role on the target-side for one-to-many MT. In
this paper, we conduct a large-scale study that varies the auxiliary target
side languages along two dimensions, i.e., linguistic similarity and corpus
size, to show the dynamic impact of knowledge transfer on the main language
pairs. We show that linguistically similar auxiliary target languages exhibit
strong ability to transfer positive knowledge. With an increasing size of
similar target languages, the positive transfer is further enhanced to benefit
the main language pairs. Meanwhile, we find distant auxiliary target languages
can also unexpectedly benefit main language pairs, even with minimal positive
transfer ability. Apart from transfer, we show distant auxiliary target
languages can act as a regularizer to benefit translation performance by
enhancing the generalization and model inference calibration.
Related papers
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [79.72910257530795]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.
It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.
Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models [26.72394783468532]
We propose an textitefficient method to study transfer language influence in zero-shot performance on another target language.
Our findings suggest that some languages do not largely affect others while some languages, especially ones unseen during pre-training, can be extremely beneficial or detrimental for different target languages.
arXiv Detail & Related papers (2024-03-29T09:52:18Z) - Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning? [8.630930380973489]
This paper investigates the potential benefits of employing machine translation as a continued training objective to enhance language representation learning.
Our results show that, contrary to expectations, machine translation as the continued training fails to enhance cross-lingual representation learning.
We conclude that explicit sentence-level alignment in the cross-lingual scenario is detrimental to cross-lingual transfer pretraining.
arXiv Detail & Related papers (2024-03-25T13:53:04Z) - Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity
Recognition [0.10641561702689348]
We investigate the properties of cross-lingual transfer learning between ten low-resourced languages.
We find that models that perform well on a single language often do so at the expense of generalising to others.
The amount of data overlap between the source and target datasets is a better predictor of transfer performance than either the geographical or genetic distance between the languages.
arXiv Detail & Related papers (2023-09-11T08:56:47Z) - Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens [15.283483438956264]
We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
arXiv Detail & Related papers (2023-05-19T09:36:48Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.