Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation
- URL: http://arxiv.org/abs/2209.01530v1
- Date: Sun, 4 Sep 2022 04:27:17 GMT
- Title: Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation
- Authors: Renren Jin and Deyi Xiong
- Abstract summary: In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
- Score: 47.19129812325682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a multilingual neural machine translation model that fully shares
parameters across all languages, an artificial language token is usually used
to guide translation into the desired target language. However, recent studies
show that prepending language tokens sometimes fails to navigate the
multilingual neural machine translation models into right translation
directions, especially on zero-shot translation. To mitigate this issue, we
propose two methods, language embedding embodiment and language-aware
multi-head attention, to learn informative language representations to channel
translation into right directions. The former embodies language embeddings into
different critical switching points along the information flow from the source
to the target, aiming at amplifying translation direction guiding signals. The
latter exploits a matrix, instead of a vector, to represent a language in the
continuous space. The matrix is chunked into multiple heads so as to learn
language representations in multiple subspaces. Experiment results on two
datasets for massively multilingual neural machine translation demonstrate that
language-aware multi-head attention benefits both supervised and zero-shot
translation and significantly alleviates the off-target translation issue.
Further linguistic typology prediction experiments show that matrix-based
language representations learned by our methods are capable of capturing rich
linguistic typology features.
Related papers
- Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Automatic Discrimination of Human and Neural Machine Translation in
Multilingual Scenarios [4.631167282648452]
We tackle the task of automatically discriminating between human and machine translations.
We perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models.
arXiv Detail & Related papers (2023-05-31T11:41:24Z) - The Geometry of Multilingual Language Model Representations [25.880639246639323]
We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language.
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information.
arXiv Detail & Related papers (2022-05-22T23:58:24Z) - The Reality of Multi-Lingual Machine Translation [3.183845608678763]
"The Reality of Multi-Lingual Machine Translation" discusses the benefits and perils of using more than two languages in machine translation systems.
Author: Machine translation is for us a prime example of deep learning applications.
arXiv Detail & Related papers (2022-02-25T16:44:06Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - CSTNet: Contrastive Speech Translation Network for Self-Supervised
Speech Representation Learning [11.552745999302905]
More than half of the 7,000 languages in the world are in imminent danger of going extinct.
It is relatively easy to obtain textual translations corresponding to speech.
We construct a convolutional neural network audio encoder capable of extracting linguistic representations from speech.
arXiv Detail & Related papers (2020-06-04T12:21:48Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.