Related papers: Adaptive Sparse Transformer for Multilingual Translation

Adaptive Sparse Transformer for Multilingual Translation

URL: http://arxiv.org/abs/2104.07358v1
Date: Thu, 15 Apr 2021 10:31:07 GMT
Title: Adaptive Sparse Transformer for Multilingual Translation
Authors: Hongyu Gong, Xian Li, Dmitriy Genzel
Abstract summary: A known challenge of multilingual models is the negative language interference. We propose an adaptive and sparse architecture for multilingual modeling. Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
Score: 18.017674093519332
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilingual machine translation has attracted much attention recently due to its support of knowledge transfer among languages and the low cost of training and deployment compared with numerous bilingual models. A known challenge of multilingual models is the negative language interference. In order to enhance the translation quality, deeper and wider architectures are applied to multilingual modeling for larger model capacity, which suffers from the increased inference cost at the same time. It has been pointed out in recent studies that parameters shared among languages are the cause of interference while they may also enable positive transfer. Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference. The sparse architecture only activates a subnetwork which preserves inference efficiency, and the adaptive design selects different subnetworks based on the input languages. Evaluated on multilingual translation across multiple public datasets, our model outperforms strong baselines in terms of translation quality without increasing the inference cost.

Related papers

Analyzing and Improving Cross-lingual Knowledge Transfer for Machine Translation [5.878901309908815]
We study cross-lingual knowledge transfer in neural models and develop methods to improve robustness and generalization in multilingual settings.<n>We examine the role of language diversity during training and show that increasing translation coverage improves generalization and reduces off-target behavior.
arXiv Detail & Related papers (2026-01-07T15:51:54Z)
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning. AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z)
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Multilingual Contrastive Framework [78.07201802874529]
ShifCon is a Shift-based multilingual Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.<n>Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z)
Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences. It aligns the internal sentence representations across different languages via multilingual contrastive learning. Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z)
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models. We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z)
Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks [16.8212280804151]
Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer. We propose novel methods for using language-specificworks, which control cross-lingual parameter sharing. We combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer.
arXiv Detail & Related papers (2022-10-31T19:23:33Z)
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z)
Improving the Cross-Lingual Generalisation in Visual Question Answering [40.86774711775718]
multilingual vision-language pretrained models show poor cross-lingual generalisation when applied to non-English data. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, and (3) we augment training examples using synthetic code
arXiv Detail & Related papers (2022-09-07T08:07:43Z)
High-resource Language-specific Training for Multilingual Neural Machine Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder. HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z)
Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help? [29.01386302441015]
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages. The performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. We propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer.
arXiv Detail & Related papers (2021-10-15T02:31:48Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages. We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.