Hierarchical Transformer for Multilingual Machine Translation
- URL: http://arxiv.org/abs/2103.03589v1
- Date: Fri, 5 Mar 2021 10:51:47 GMT
- Title: Hierarchical Transformer for Multilingual Machine Translation
- Authors: Albina Khusainova, Adil Khan, Ad\'in Ram\'irez Rivera, Vitaly Romanov
- Abstract summary: The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used.
Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently.
We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.
- Score: 3.441021278275805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The choice of parameter sharing strategy in multilingual machine translation
models determines how optimally parameter space is used and hence, directly
influences ultimate translation quality. Inspired by linguistic trees that show
the degree of relatedness between different languages, the new general approach
to parameter sharing in multilingual machine translation was suggested
recently. The main idea is to use these expert language hierarchies as a basis
for multilingual architecture: the closer two languages are, the more
parameters they share. In this work, we test this idea using the Transformer
architecture and show that despite the success in previous work there are
problems inherent to training such hierarchical models. We demonstrate that in
case of carefully chosen training strategy the hierarchical architecture can
outperform bilingual models and multilingual models with full parameter
sharing.
Related papers
- Understanding the role of FFNs in driving multilingual behaviour in LLMs [0.0]
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of Large Language Models.
We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing.
arXiv Detail & Related papers (2024-04-22T03:47:00Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Building Multilingual Machine Translation Systems That Serve Arbitrary
X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions.
We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - Parameter Differentiation based Multilingual Neural Machine Translation [37.16691633466614]
Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model.
It is still an open question which parameters should be shared and which ones need to be task-specific.
We propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific.
arXiv Detail & Related papers (2021-12-27T11:41:52Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference.
We propose an adaptive and sparse architecture for multilingual modeling.
Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - A Framework for Hierarchical Multilingual Machine Translation [3.04585143845864]
This paper presents a hierarchical framework for building multilingual machine translation strategies.
It takes advantage of a typological language family tree for enabling transfer among similar languages.
Exhaustive experimentation on a dataset with 41 languages demonstrates the validity of the proposed framework.
arXiv Detail & Related papers (2020-05-12T01:24:43Z) - UDapter: Language Adaptation for Truly Universal Dependency Parsing [6.346772579930929]
Cross-language interference and restrained model capacity remain major obstacles to universal multilingual dependency parsing.
We propose a novel multilingual task adaptation approach based on contextual parameter generation and adapter modules.
The resulting UDapter outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages.
arXiv Detail & Related papers (2020-04-29T16:52:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.