Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation
- URL: http://arxiv.org/abs/2309.16599v1
- Date: Thu, 28 Sep 2023 17:02:36 GMT
- Title: Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation
- Authors: Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng
Liu, Dacheng Tao
- Abstract summary: Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
- Score: 79.96416609433724
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Zero-shot translation (ZST), which is generally based on a multilingual
neural machine translation model, aims to translate between unseen language
pairs in training data. The common practice to guide the zero-shot language
mapping during inference is to deliberately insert the source and target
language IDs, e.g., <EN> for English and <DE> for German. Recent studies have
shown that language IDs sometimes fail to navigate the ZST task, making them
suffer from the off-target problem (non-target language words exist in the
generated translation) and, therefore, difficult to apply the current
multilingual translation model to a broad range of zero-shot language
scenarios. To understand when and why the navigation capabilities of language
IDs are weakened, we compare two extreme decoder input cases in the ZST
directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively
visualizing the contextual word representations (CWRs) of these cases with
teacher forcing, we show that 1) the CWRs of different languages are
effectively distributed in separate regions when the sentence and ID are
matched (ON setting), and 2) if the sentence and ID are unmatched (OFF
setting), the CWRs of different languages are chaotically distributed. Our
analyses suggest that although they work well in ideal ON settings, language
IDs become fragile and lose their navigation ability when faced with off-target
tokens, which commonly exist during inference but are rare in training
scenarios. In response, we employ unlikelihood tuning on the negative (OFF)
samples to minimize their probability such that the language IDs can
discriminate between the on- and off-target tokens during training. Experiments
spanning 40 ZST directions show that our method reduces the off-target ratio by
-48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3%
tuning cost.
Related papers
- ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language [53.8622516025736]
We propose a novel end-to-end zero-shot dialogue generation model ChatZero based on cross-lingual code-switching method.
Experiments on the multilingual DailyDialog and DSTC7-AVSD datasets demonstrate that ChatZero can achieve more than 90% of the original performance.
arXiv Detail & Related papers (2024-08-16T13:11:53Z) - Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation [16.368747052909214]
We introduce the identity pair, a sentence translated into itself, to address the lack of the base measure in multilingual investigations.
We demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state.
Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder.
arXiv Detail & Related papers (2024-06-12T11:16:30Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on
POS Tagging for Non-Standardized Languages [18.210880703295253]
We finetune pretrained language models (PLMs) on seven languages from three different families.
We analyze their zero-shot performance on closely related, non-standardized varieties.
Overall, we find that the similarity between the percentage of words that get split into subwords in the source and target data is the strongest predictor for model performance on target data.
arXiv Detail & Related papers (2023-04-20T08:32:34Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Zero-shot Speech Translation [0.0]
Speech Translation (ST) is the task of translating speech in one language into text in another language.
End-to-end approaches use only one system to avoid propagating error, yet are difficult to employ due to data scarcity.
We explore zero-shot translation, which enables translating a pair of languages that is unseen during training.
arXiv Detail & Related papers (2021-07-13T12:00:44Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z) - Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.