Semantics-aware Attention Improves Neural Machine Translation
- URL: http://arxiv.org/abs/2110.06920v1
- Date: Wed, 13 Oct 2021 17:58:22 GMT
- Title: Semantics-aware Attention Improves Neural Machine Translation
- Authors: Aviv Slobodkin, Leshem Choshen, Omri Abend
- Abstract summary: We propose two novel parameter-free methods for injecting semantic information into Transformers.
One such method operates on the encoder, through a Scene-Aware Self-Attention (SASA) head.
Another on the decoder, through a Scene-Aware Cross-Attention (SACrA) head.
- Score: 35.32217580058933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of syntactic structures into Transformer machine translation
has shown positive results, but to our knowledge, no work has attempted to do
so with semantic structures. In this work we propose two novel parameter-free
methods for injecting semantic information into Transformers, both rely on
semantics-aware masking of (some of) the attention heads. One such method
operates on the encoder, through a Scene-Aware Self-Attention (SASA) head.
Another on the decoder, through a Scene-Aware Cross-Attention (SACrA) head. We
show a consistent improvement over the vanilla Transformer and syntax-aware
models for four language pairs. We further show an additional gain when using
both semantic and syntactic structures in some language pairs.
Related papers
- Is context all you need? Scaling Neural Sign Language Translation to
Large Domains of Discourse [34.70927441846784]
Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos.
We propose a novel multi-modal transformer architecture that tackles the translation task in a context-aware manner, as a human would.
We report significant improvements on state-of-the-art translation performance using contextual information, nearly doubling the reported BLEU-4 scores of baseline approaches.
arXiv Detail & Related papers (2023-08-18T15:27:22Z) - Is Encoder-Decoder Redundant for Neural Machine Translation? [44.37101354412253]
encoder-decoder architecture is still the de facto neural network architecture for state-of-the-art models.
In this work, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation.
This alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.
arXiv Detail & Related papers (2022-10-21T08:33:55Z) - Syntax-guided Localized Self-attention by Constituency Syntactic
Distance [26.141356981833862]
We propose a syntax-guided localized self-attention for Transformer.
It allows incorporating directly grammar structures from an external constituency.
Experimental results show that our model could consistently improve translation performance.
arXiv Detail & Related papers (2022-10-21T06:37:25Z) - Can Transformer be Too Compositional? Analysing Idiom Processing in
Neural Machine Translation [55.52888815590317]
Unlike literal expressions, idioms' meanings do not directly follow from their parts.
NMT models are often unable to translate idioms accurately and over-generate compositional, literal translations.
We investigate whether the non-compositionality of idioms is reflected in the mechanics of the dominant NMT model, Transformer.
arXiv Detail & Related papers (2022-05-30T17:59:32Z) - LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [80.54244087314025]
We show that better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in vision Transformer encoder network.
Our method surpasses the previous state-of-the-art methods on RefCOCO, RefCO+, and G-Ref by large margins.
arXiv Detail & Related papers (2021-12-04T04:53:35Z) - Transferring Semantic Knowledge Into Language Encoders [6.85316573653194]
We introduce semantic form mid-tuning, an approach for transferring semantic knowledge from semantic meaning representations into language encoders.
We show that this alignment can be learned implicitly via classification or directly via triplet loss.
Our method yields language encoders that demonstrate improved predictive performance across inference, reading comprehension, textual similarity, and other semantic tasks.
arXiv Detail & Related papers (2021-10-14T14:11:12Z) - Episodic Transformer for Vision-and-Language Navigation [142.6236659368177]
This paper focuses on addressing two challenges: handling long sequence of subtasks, and understanding complex human instructions.
We propose Episodic Transformer (E.T.), a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.
Our approach sets a new state of the art on the challenging ALFRED benchmark, achieving 38.4% and 8.5% task success rates on seen and unseen test splits.
arXiv Detail & Related papers (2021-05-13T17:51:46Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns.
Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.