Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for
English to Indian Languages
- URL: http://arxiv.org/abs/2308.16075v1
- Date: Wed, 30 Aug 2023 14:52:14 GMT
- Title: Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for
English to Indian Languages
- Authors: Baban Gain, Dibyanayan Bandyopadhyay, Samrat Mukherjee, Chandranath
Adak, Asif Ekbal
- Abstract summary: The study investigates the effectiveness of utilizing multimodal information in Neural Machine Translation (NMT)
Surprisingly, the study finds that images might be redundant in this context.
Experiments translate from English to Hindi, Bengali, and Malayalam, outperforming state-of-the-art benchmarks significantly.
- Score: 29.416563233407892
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The study investigates the effectiveness of utilizing multimodal information
in Neural Machine Translation (NMT). While prior research focused on using
multimodal data in low-resource scenarios, this study examines how image
features impact translation when added to a large-scale, pre-trained unimodal
NMT system. Surprisingly, the study finds that images might be redundant in
this context. Additionally, the research introduces synthetic noise to assess
whether images help the model deal with textual noise. Multimodal models
slightly outperform text-only models in noisy settings, even with random
images. The study's experiments translate from English to Hindi, Bengali, and
Malayalam, outperforming state-of-the-art benchmarks significantly.
Interestingly, the effect of visual context varies with source text noise: no
visual context works best for non-noisy translations, cropped image features
are optimal for low noise, and full image features work better in high-noise
scenarios. This sheds light on the role of visual context, especially in noisy
settings, opening up a new research direction for Noisy Neural Machine
Translation in multimodal setups. The research emphasizes the importance of
combining visual and textual information for improved translation in various
environments.
Related papers
- M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal
Aspect-based Sentiment Analysis [32.9772577419091]
Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained Sentiment Analysis task.
We propose a Multi-grained Multi-curriculum Denoising Framework (M2DF) which can achieve denoising by adjusting the order of training data.
Our framework consistently outperforms state-of-the-art work on three sub-tasks of MABSA.
arXiv Detail & Related papers (2023-10-23T06:22:39Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Scene Graph as Pivoting: Inference-time Image-free Unsupervised
Multimodal Machine Translation with Visual Scene Hallucination [88.74459704391214]
In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup.
We represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics.
Several SG-pivoting based learning objectives are introduced for unsupervised translation training.
Our method outperforms the best-performing baseline by significant BLEU scores on the task and setup.
arXiv Detail & Related papers (2023-05-20T18:17:20Z) - Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning [25.230786853723203]
We propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages.
We use Machine Translation to construct pseudo-parallel sentence pairs for low-resource languages.
We introduce a multi-view self-distillation method to learn noise-robust target-language representations.
arXiv Detail & Related papers (2022-08-26T09:32:24Z) - Multimodal Neural Machine Translation with Search Engine Based Image
Retrieval [4.662583832063716]
We propose an open-vocabulary image retrieval method to collect descriptive images for bilingual parallel corpus.
Our proposed method achieves significant improvements over strong baselines.
arXiv Detail & Related papers (2022-07-26T08:42:06Z) - Vision Matters When It Should: Sanity Checking Multimodal Machine
Translation Models [25.920891392933058]
Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available.
Recent studies have also shown that the performance of MMT models is only marginally impacted when the associated image is replaced with an unrelated image or noise.
arXiv Detail & Related papers (2021-09-08T03:32:48Z) - MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
Grounding [74.33171794972688]
We present algorithms to model phrase-object relevance by leveraging fine-grained visual representations and visually-aware language representations.
Experiments conducted on the widely-adopted Flickr30k dataset show a significant improvement over existing weakly-supervised methods.
arXiv Detail & Related papers (2020-10-12T00:43:52Z) - Unsupervised Multimodal Neural Machine Translation with Pseudo Visual
Pivoting [105.5303416210736]
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only.
It is still challenging to associate source-target sentences in the latent space.
As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising.
arXiv Detail & Related papers (2020-05-06T20:11:46Z) - Robust Unsupervised Neural Machine Translation with Adversarial
Denoising Training [66.39561682517741]
Unsupervised neural machine translation (UNMT) has attracted great interest in the machine translation community.
The main advantage of the UNMT lies in its easy collection of required large training text sentences.
In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems.
arXiv Detail & Related papers (2020-02-28T05:17:55Z) - Informative Sample Mining Network for Multi-Domain Image-to-Image
Translation [101.01649070998532]
We show that improving the sample selection strategy is an effective solution for image-to-image translation tasks.
We propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness.
arXiv Detail & Related papers (2020-01-05T05:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.