Exploiting Multimodal Reinforcement Learning for Simultaneous Machine
Translation
- URL: http://arxiv.org/abs/2102.11387v1
- Date: Mon, 22 Feb 2021 22:26:22 GMT
- Title: Exploiting Multimodal Reinforcement Learning for Simultaneous Machine
Translation
- Authors: Julia Ive, Andy Mingren Li, Yishu Miao, Ozan Caglayan, Pranava
Madhyastha, Lucia Specia
- Abstract summary: We explore two main concepts: (a) adaptive policies to learn a good trade-off between high translation quality and low latency; and (b) visual information to support this process.
We propose a multimodal approach to simultaneous machine translation using reinforcement learning, with strategies to integrate visual and textual information in both the agent and the environment.
- Score: 33.698254673743904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of simultaneous machine translation (SiMT)
by exploring two main concepts: (a) adaptive policies to learn a good trade-off
between high translation quality and low latency; and (b) visual information to
support this process by providing additional (visual) contextual information
which may be available before the textual input is produced. For that, we
propose a multimodal approach to simultaneous machine translation using
reinforcement learning, with strategies to integrate visual and textual
information in both the agent and the environment. We provide an exploration on
how different types of visual information and integration strategies affect the
quality and latency of simultaneous translation models, and demonstrate that
visual cues lead to higher quality while keeping the latency low.
Related papers
- WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual
World Knowledge [73.76722241704488]
We propose a plug-in framework named WisdoM to leverage the contextual world knowledge induced from the large vision-language models (LVLMs) for enhanced multimodal sentiment analysis.
We show that our approach has substantial improvements over several state-of-the-art methods.
arXiv Detail & Related papers (2024-01-12T16:08:07Z) - Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
Detection [72.36017150922504]
We propose a multi-modal contextual knowledge distillation framework, MMC-Det, to transfer the learned contextual knowledge from a teacher fusion transformer to a student detector.
The diverse multi-modal masked language modeling is realized by an object divergence constraint upon traditional multi-modal masked language modeling (MLM)
arXiv Detail & Related papers (2023-08-30T08:33:13Z) - Increasing Visual Awareness in Multimodal Neural Machine Translation
from an Information Theoretic Perspective [14.100033405711685]
Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image.
In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective.
arXiv Detail & Related papers (2022-10-16T08:11:44Z) - mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
Skip-connections [104.14624185375897]
mPLUG is a new vision-language foundation model for both cross-modal understanding and generation.
It achieves state-of-the-art results on a wide range of vision-language downstream tasks, such as image captioning, image-text retrieval, visual grounding and visual question answering.
arXiv Detail & Related papers (2022-05-24T11:52:06Z) - Supervised Visual Attention for Simultaneous Multimodal Machine
Translation [47.18251159303909]
We propose the first Transformer-based simultaneous machine translation (MMT) architecture.
We extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments.
Our results show that supervised visual attention consistently improves the translation quality of the MMT models.
arXiv Detail & Related papers (2022-01-23T17:25:57Z) - Improving Speech Translation by Understanding and Learning from the
Auxiliary Text Translation Task [26.703809355057224]
We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework.
Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities.
Inspired by these findings, we propose three methods to improve translation quality.
arXiv Detail & Related papers (2021-07-12T23:53:40Z) - Simultaneous Machine Translation with Visual Context [42.88121241096681]
Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible.
We analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks.
arXiv Detail & Related papers (2020-09-15T18:19:11Z) - Dynamic Context-guided Capsule Network for Multimodal Machine
Translation [131.37130887834667]
Multimodal machine translation (MMT) mainly focuses on enhancing text-only translation with visual features.
We propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT.
Experimental results on the Multi30K dataset of English-to-German and English-to-French translation demonstrate the superiority of DCCN.
arXiv Detail & Related papers (2020-09-04T06:18:24Z) - Towards Multimodal Simultaneous Neural Machine Translation [28.536262015508722]
Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding.
This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding.
We propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality.
arXiv Detail & Related papers (2020-04-07T08:02:21Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.