Related papers: Causes and Cures for Interference in Multilingual Translation

Causes and Cures for Interference in Multilingual Translation

URL: http://arxiv.org/abs/2212.07530v3
Date: Fri, 19 May 2023 12:26:50 GMT
Title: Causes and Cures for Interference in Multilingual Translation
Authors: Uri Shaham and Maha Elbayad and Vedanuj Goswami and Omer Levy and Shruti Bhosale
Abstract summary: This work identifies the main factors that contribute to interference in multilingual machine translation. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data. tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference.
Score: 44.98751458618928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.

Related papers

Interference Matrix: Quantifying Cross-Lingual Interference in Transformer Encoders [55.749883010057545]
We construct an interference matrix by training and evaluating small BERT-like models on all possible language pairs.<n>Our analysis reveals that interference between languages is asymmetrical and that its patterns do not align with traditional linguistic characteristics.
arXiv Detail & Related papers (2025-08-04T10:02:19Z)
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models [1.2027959564488593]
Large Transformer based language models achieve remarkable performance but remain opaque in how they plan, structure, and realize text.<n>We introduce Multi_Scale Probabilistic Generation Theory (MSPGT), a hierarchical framework that factorizes generation into three semantic scales_global context, intermediate structure, and local word choices.
arXiv Detail & Related papers (2025-05-23T16:55:35Z)
Causal Message Passing for Experiments with Unknown and General Network Interference [5.294604210205507]
We introduce a new framework to accommodate complex and unknown network interference. Our framework, termed causal message-passing, is grounded in high-dimensional approximate message passing methodology. We demonstrate the effectiveness of this approach across five numerical scenarios.
arXiv Detail & Related papers (2023-11-14T17:31:50Z)
Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter [21.512817959760007]
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
arXiv Detail & Related papers (2023-05-21T12:48:38Z)
Scaling Laws for Multilingual Neural Machine Translation [45.620062316968976]
We study how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior. We find that changing the weightings of the individual language pairs in the training mixture only affect the multiplicative factor of the scaling law. We leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale.
arXiv Detail & Related papers (2023-02-19T18:43:24Z)
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters. We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training [58.72619374790418]
MultiUAT dynamically adjusts the training data usage based on the model's uncertainty. We analyze the cross-domain transfer and show the deficiency of static and similarity based methods.
arXiv Detail & Related papers (2021-09-06T08:30:33Z)
Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference. We propose an adaptive and sparse architecture for multilingual modeling. Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z)
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages. We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
Modeling Voting for System Combination in Machine Translation [92.09572642019145]
We propose an approach to modeling voting for system combination in machine translation. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training.
arXiv Detail & Related papers (2020-07-14T09:59:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.