Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens
- URL: http://arxiv.org/abs/2305.11550v3
- Date: Mon, 4 Dec 2023 10:15:37 GMT
- Title: Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens
- Authors: David Stap, Vlad Niculae, Christof Monz
- Abstract summary: We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
- Score: 15.283483438956264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We argue that translation quality alone is not a sufficient metric for
measuring knowledge transfer in multilingual neural machine translation. To
support this claim, we introduce Representational Transfer Potential (RTP),
which measures representational similarities between languages. We show that
RTP can measure both positive and negative transfer (interference), and find
that RTP is strongly correlated with changes in translation quality, indicating
that transfer does occur. Furthermore, we investigate data and language
characteristics that are relevant for transfer, and find that multi-parallel
overlap is an important yet under-explored feature. Based on this, we develop a
novel training scheme, which uses an auxiliary similarity loss that encourages
representations to be more invariant across languages by taking advantage of
multi-parallel data. We show that our method yields increased translation
quality for low- and mid-resource languages across multiple data and model
setups.
Related papers
- Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation [9.838281446902268]
We conduct a large-scale study that varies the auxiliary target side languages along two dimensions.
We show that linguistically similar target languages exhibit strong ability to transfer positive knowledge.
With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs.
Meanwhile, distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability.
arXiv Detail & Related papers (2024-02-01T10:55:03Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Measuring Cross-Lingual Transferability of Multilingual Transformers on
Sentence Classification [49.8111760092473]
We propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification tasks.
Experimental results show that IGap outperforms baseline metrics for transferability measuring and transfer direction ranking.
Our results reveal three findings about cross-lingual transfer, which helps us to better understand multilingual Transformers.
arXiv Detail & Related papers (2023-05-15T17:05:45Z) - DiTTO: A Feature Representation Imitation Approach for Improving
Cross-Lingual Transfer [15.062937537799005]
languages as domains for improving zero-shot transfer.
We show that our approach, DiTTO, significantly outperforms the standard zero-shot fine-tuning method.
Our model enables better cross-lingual transfer than standard fine-tuning methods, even in the few-shot setting.
arXiv Detail & Related papers (2023-03-04T08:42:50Z) - Data-adaptive Transfer Learning for Translation: A Case Study in Haitian
and Jamaican [4.4096464238164295]
We show that transfer effectiveness is correlated with amount of training data and relationships between languages.
We contribute a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding.
In very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.
arXiv Detail & Related papers (2022-09-13T20:58:46Z) - Multi-channel Transformers for Multi-articulatory Sign Language
Translation [59.38247587308604]
We tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture.
The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself.
arXiv Detail & Related papers (2020-09-01T09:10:55Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.