Related papers: Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

URL: http://arxiv.org/abs/2205.11631v1
Date: Mon, 23 May 2022 20:59:14 GMT
Title: Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
Authors: Javier Ferrando, Gerard I. G\'allego, Belen Alastruey, Carlos Escolano, Marta R. Costa-juss\`a
Abstract summary: In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix. Previous work on interpretability in NMT has focused solely on source sentence tokens attributions. We propose an interpretability method that tracks complete input token attributions.
Score: 1.8594711725515678
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has focused solely on source sentence tokens attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target prefix) in the model predictions. In this work, we propose an interpretability method that tracks complete input token attributions. Our method, which can be extended to any encoder-decoder Transformer-based model, allows us to better comprehend the inner workings of current NMT models. We apply the proposed method to both bilingual and multilingual Transformers and present insights into their behaviour.

Related papers

Learning Homographic Disambiguation Representation for Neural Machine Translation [20.242134720005467]
Homographs, words with the same spelling but different meanings, remain challenging in Neural Machine Translation (NMT) We propose a novel approach to tackle issues of NMT in the latent space. We first train an encoder (aka " homographic-encoder") to learn universal sentence representations in a natural language inference (NLI) task. We further fine-tune the encoder using homograph-based syn-set WordNet, enabling it to learn word-set representations from sentences.
arXiv Detail & Related papers (2023-04-12T13:42:59Z)
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space [49.029910567673824]
Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. We make a substantial step towards unveiling this underlying prediction process, by reverse-engineering the operation of the feed-forward network (FFN) layers.
arXiv Detail & Related papers (2022-03-28T12:26:00Z)
Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation [74.99653288574892]
We propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for neural machine translation (NMT) Our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets.
arXiv Detail & Related papers (2022-02-28T10:24:22Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models. We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z)
Token Drop mechanism for Neural Machine Translation [12.666468105300002]
We propose Token Drop to improve generalization and avoid overfitting for the NMT model. Similar to word dropout, whereas we replace dropped token with a special token instead of setting zero to words.
arXiv Detail & Related papers (2020-10-21T14:02:27Z)
Universal Vector Neural Machine Translation With Effective Attention [0.0]
We propose a singular model for Neural Machine Translation based on encoder-decoder models. We introduce a neutral/universal model representation that can be used to predict more than one language.
arXiv Detail & Related papers (2020-06-09T01:13:57Z)
Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency. We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT. The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence. We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.