Fugu-MT 論文翻訳(概要): Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

論文の概要: Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

arxiv url: http://arxiv.org/abs/2003.09586v2
Date: Tue, 20 Apr 2021 00:31:13 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-21 12:58:31.218708
Title: Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers
Title（参考訳）: エンコーダ層におけるトランスフォーマとトレーディングデコーダにおける単語翻訳の探索
Authors: Hongfei Xu and Josef van Genabith and Qiuhui Liu and Deyi Xiong
Abstract要約: トランスフォーマー層における単語の翻訳方法はまだ研究されていない。翻訳はすでにエンコーダ層や入力埋め込みでも徐々に行われています。実験の結果,翻訳品質が低い2.3までの速度向上が可能であり,さらに18-4のディープエンコーダ構成では翻訳品質が1.42BLEU(En-De)の速度アップで+1.42BLEU(En-De)向上することがわかった。
参考スコア（独自算出の注目度）: 69.40942736249397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to its effectiveness and performance, the Transformer translation model has attracted wide attention, most recently in terms of probing-based approaches. Previous work focuses on using or probing source linguistic features in the encoder. To date, the way word translation evolves in Transformer layers has not yet been investigated. Naively, one might assume that encoder layers capture source information while decoder layers translate. In this work, we show that this is not quite the case: translation already happens progressively in encoder layers and even in the input embeddings. More surprisingly, we find that some of the lower decoder layers do not actually do that much decoding. We show all of this in terms of a probing approach where we project representations of the layer analyzed to the final trained and frozen classifier level of the Transformer decoder to measure word translation accuracy. Our findings motivate and explain a Transformer configuration change: if translation already happens in the encoder layers, perhaps we can increase the number of encoder layers, while decreasing the number of decoder layers, boosting decoding speed, without loss in translation quality? Our experiments show that this is indeed the case: we can increase speed by up to a factor 2.3 with small gains in translation quality, while an 18-4 deep encoder configuration boosts translation quality by +1.42 BLEU (En-De) at a speed-up of 1.4.
Abstract（参考訳）: その効果と性能のため、変圧器の翻訳モデルは、最近はプロービングに基づくアプローチで広く注目を集めている。以前の研究は、エンコーダのソース言語的特徴の使用または調査に焦点を当てていた。現在までトランスフォーマー層における単語翻訳の進化は研究されていない。典型的には、エンコーダ層がソース情報をキャプチャし、デコーダ層が翻訳すると仮定する。翻訳はすでにエンコーダ層や入力埋め込み層で徐々に行われています。さらに驚くことに、下位のデコーダレイヤのいくつかは、実際にはそれほどデコードを行わない。我々は、トランスフォーマーデコーダの最終的な訓練および凍結された分類器レベルに解析された層の表現を投影し、単語の翻訳精度を計測するプロービングアプローチの観点から、これらすべてを示す。もし変換が既にエンコーダ層で発生しているなら、おそらくエンコーダ層数を増加させ、デコーダ層数を減少させ、デコーダ速度を増加させ、変換品質を損なうことなく、デコーダ層を増加させることができるだろうか? 翻訳品質の小さな向上で最大2.3倍の速度向上が可能で、18-4のディープエンコーダ構成で1.42 bleu (en-de) の高速化を実現しています。

論文の概要: Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

関連論文リスト