Fugu-MT 論文翻訳(概要): Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

論文の概要: Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

arxiv url: http://arxiv.org/abs/2309.16599v1
Date: Thu, 28 Sep 2023 17:02:36 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-29 13:17:21.805885
Title: Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Title（参考訳）: ゼロショット翻訳を改良した負のサンプルの差分調整
Authors: Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, Dacheng Tao
Abstract要約: Zero-shot Translation (ZST)は、トレーニングデータにおいて、目に見えない言語ペア間の翻訳を目的としている。推論中にゼロショット言語マッピングをガイドする一般的な方法は、ソースとターゲット言語IDを意図的に挿入することである。近年の研究では、言語IDが時折ZSTタスクのナビゲートに失敗し、ターゲット外問題に悩まされることが示されている。
参考スコア（独自算出の注目度）: 79.96416609433724
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., <EN> for English and <DE> for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost.
Abstract（参考訳）: ゼロショット翻訳(ZST)は、一般的に多言語ニューラルマシン翻訳モデルに基づいており、トレーニングデータにおいて目に見えない言語ペア間の翻訳を目的としている。推論中のゼロショット言語マッピングをガイドする一般的な実践は、ソースとターゲットの言語IDを意図的に挿入する、例えば、英語の<EN>とドイツ語の<DE>である。近年の研究では、言語idは時としてzstタスクのナビゲートに失敗し、ターゲット外の問題(生成した翻訳に目標語以外の単語が存在する)に苦しむため、現在の多言語翻訳モデルを広範囲のゼロショット言語シナリオに適用することは困難である。言語idのナビゲーション能力が弱まる理由を理解するため、zst方向の2つの極端なデコーダ入力ケースを比較する:オフターゲット(オフ)とオンターゲット(オン)ケース。これら事例の文脈的単語表現(CWR)を教師強制で対照的に可視化することにより、そのことを示す。 1) 文とIDが一致した場合(ON設定)、異なる言語のCWRを独立領域に効果的に分散する。 2) 文とIDが一致しない場合(OFF設定)、異なる言語のCWRをカオス的に分配する。分析結果から,言語idが脆弱になり,推定中に一般的に存在するが,訓練シナリオでは稀であるオフターゲットトークンに直面するとナビゲーション能力が失われることが示唆された。これに対し、負のサンプル(OFF)に対して、トレーニング中に言語IDがオン・ターゲットトークンとオフ・ターゲットトークンを区別できるように、その確率を最小化する。 40ZST方向にわたる実験により,本手法は目標外比を平均-48.0%削減し,追加の+0.3%のチューニングコストで+9.1BLEUの改善を実現した。

論文の概要: Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

関連論文リスト