Supervised Visual Attention for Simultaneous Multimodal Machine
Translation
- URL: http://arxiv.org/abs/2201.09324v1
- Date: Sun, 23 Jan 2022 17:25:57 GMT
- Title: Supervised Visual Attention for Simultaneous Multimodal Machine
Translation
- Authors: Veneta Haralampieva, Ozan Caglayan, Lucia Specia
- Abstract summary: We propose the first Transformer-based simultaneous machine translation (MMT) architecture.
We extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments.
Our results show that supervised visual attention consistently improves the translation quality of the MMT models.
- Score: 47.18251159303909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been a surge in research in multimodal machine
translation (MMT), where additional modalities such as images are used to
improve translation quality of textual systems. A particular use for such
multimodal systems is the task of simultaneous machine translation, where
visual context has been shown to complement the partial information provided by
the source sentence, especially in the early phases of translation (Caglayanet
al., 2020a; Imankulova et al., 2020). In this paper, we propose the first
Transformer-based simultaneous MMT architecture, which has not been previously
explored in the field. Additionally, we extend this model with an auxiliary
supervision signal that guides its visual attention mechanism using labelled
phrase-region alignments. We perform comprehensive experiments on three
language directions and conduct thorough quantitative and qualitative analyses
using both automatic metrics and manual inspection. Our results show that (i)
supervised visual attention consistently improves the translation quality of
the MMT models, and (ii) fine-tuning the MMT with supervision loss enabled
leads to better performance than training the MMT from scratch. Compared to the
state-of-the-art, our proposed model achieves improvements of up to 2.3 BLEU
and 3.5 METEOR points.
Related papers
- Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Beyond Triplet: Leveraging the Most Data for Multimodal Machine
Translation [53.342921374639346]
Multimodal machine translation aims to improve translation quality by incorporating information from other modalities, such as vision.
Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets.
This paper establishes new methods and new datasets for MMT.
arXiv Detail & Related papers (2022-12-20T15:02:38Z) - Tackling Ambiguity with Images: Improved Multimodal Machine Translation
and Contrastive Evaluation [72.6667341525552]
We present a new MMT approach based on a strong text-only MT model, which uses neural adapters and a novel guided self-attention mechanism.
We also introduce CoMMuTE, a Contrastive Multimodal Translation Evaluation set of ambiguous sentences and their possible translations.
Our approach obtains competitive results compared to strong text-only models on standard English-to-French, English-to-German and English-to-Czech benchmarks.
arXiv Detail & Related papers (2022-12-20T10:18:18Z) - Exploiting Multimodal Reinforcement Learning for Simultaneous Machine
Translation [33.698254673743904]
We explore two main concepts: (a) adaptive policies to learn a good trade-off between high translation quality and low latency; and (b) visual information to support this process.
We propose a multimodal approach to simultaneous machine translation using reinforcement learning, with strategies to integrate visual and textual information in both the agent and the environment.
arXiv Detail & Related papers (2021-02-22T22:26:22Z) - Simultaneous Machine Translation with Visual Context [42.88121241096681]
Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible.
We analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks.
arXiv Detail & Related papers (2020-09-15T18:19:11Z) - Dynamic Context-guided Capsule Network for Multimodal Machine
Translation [131.37130887834667]
Multimodal machine translation (MMT) mainly focuses on enhancing text-only translation with visual features.
We propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT.
Experimental results on the Multi30K dataset of English-to-German and English-to-French translation demonstrate the superiority of DCCN.
arXiv Detail & Related papers (2020-09-04T06:18:24Z) - Unsupervised Multimodal Neural Machine Translation with Pseudo Visual
Pivoting [105.5303416210736]
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only.
It is still challenging to associate source-target sentences in the latent space.
As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising.
arXiv Detail & Related papers (2020-05-06T20:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.