Generative Imagination Elevates Machine Translation
- URL: http://arxiv.org/abs/2009.09654v2
- Date: Tue, 13 Apr 2021 03:02:15 GMT
- Title: Generative Imagination Elevates Machine Translation
- Authors: Quanyu Long, Mingxuan Wang, Lei Li
- Abstract summary: We propose ImagiT, a novel machine translation method via visual imagination.
ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the "imagined representation" to produce a target translation.
Experiments demonstrate that ImagiT benefits from visual imagination and significantly outperforms the text-only neural machine translation baselines.
- Score: 37.78397666835735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are common semantics shared across text and images. Given a sentence in
a source language, whether depicting the visual scene helps translation into a
target language? Existing multimodal neural machine translation methods (MNMT)
require triplets of bilingual sentence - image for training and tuples of
source sentence - image for inference. In this paper, we propose ImagiT, a
novel machine translation method via visual imagination. ImagiT first learns to
generate visual representation from the source sentence, and then utilizes both
source sentence and the "imagined representation" to produce a target
translation. Unlike previous methods, it only needs the source sentence at the
inference time. Experiments demonstrate that ImagiT benefits from visual
imagination and significantly outperforms the text-only neural machine
translation baselines. Further analysis reveals that the imagination process in
ImagiT helps fill in missing information when performing the degradation
strategy.
Related papers
- An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant.
We build three pipelines comprising state-of-the-art generative models to do the task.
We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - Scene Graph as Pivoting: Inference-time Image-free Unsupervised
Multimodal Machine Translation with Visual Scene Hallucination [88.74459704391214]
In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup.
We represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics.
Several SG-pivoting based learning objectives are introduced for unsupervised translation training.
Our method outperforms the best-performing baseline by significant BLEU scores on the task and setup.
arXiv Detail & Related papers (2023-05-20T18:17:20Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Multimodal Neural Machine Translation with Search Engine Based Image
Retrieval [4.662583832063716]
We propose an open-vocabulary image retrieval method to collect descriptive images for bilingual parallel corpus.
Our proposed method achieves significant improvements over strong baselines.
arXiv Detail & Related papers (2022-07-26T08:42:06Z) - VALHALLA: Visual Hallucination for Machine Translation [64.86515924691899]
We introduce a visual hallucination framework, called VALHALLA.
It requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation.
In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text.
arXiv Detail & Related papers (2022-05-31T20:25:15Z) - Neural Machine Translation with Phrase-Level Universal Visual
Representations [11.13240570688547]
We propose a phrase-level retrieval-based method for MMT to get visual information for the source input from existing sentence-image data sets.
Our method performs retrieval at the phrase level and hence learns visual information from pairs of source phrase and grounded region.
Experiments show that the proposed method significantly outperforms strong baselines on multiple MMT datasets.
arXiv Detail & Related papers (2022-03-19T11:21:13Z) - Simultaneous Machine Translation with Visual Context [42.88121241096681]
Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible.
We analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks.
arXiv Detail & Related papers (2020-09-15T18:19:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.