Cross-lingual Visual Pre-training for Multimodal Machine Translation
- URL: http://arxiv.org/abs/2101.10044v1
- Date: Mon, 25 Jan 2021 12:46:41 GMT
- Title: Cross-lingual Visual Pre-training for Multimodal Machine Translation
- Authors: Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha,
Erkut Erdem, Aykut Erdem, Lucia Specia
- Abstract summary: We combine cross-lingual and visual pre-training methods to learn cross-lingual representations.
We show that when fine-tuned for multimodal machine translation, these models obtain state-of-the-art performance.
- Score: 36.4592103797139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models have been shown to improve performance in many
natural language tasks substantially. Although the early focus of such models
was single language pre-training, recent advances have resulted in
cross-lingual and visual pre-training methods. In this paper, we combine these
two approaches to learn visually-grounded cross-lingual representations.
Specifically, we extend the translation language modelling (Lample and Conneau,
2019) with masked region classification and perform pre-training with three-way
parallel vision & language corpora. We show that when fine-tuned for multimodal
machine translation, these models obtain state-of-the-art performance. We also
provide qualitative insights into the usefulness of the learned grounded
representations.
Related papers
- Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - A Survey of Vision-Language Pre-training from the Lens of Multimodal
Machine Translation [13.426403221815063]
This paper surveys the landscape of language-and-vision pre-training from the lens of multimodal machine translation.
We summarize the common architectures, pre-training objectives, and datasets from literature and conjecture what further is needed to make progress on multimodal machine translation.
arXiv Detail & Related papers (2023-06-12T15:56:10Z) - RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training [84.23022072347821]
We propose a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs.
Experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-05-13T14:41:05Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training [52.852163987208826]
UC2 is the first machine translation-augmented framework for cross-lingual cross-modal representation learning.
We propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM)
Our proposed framework achieves new state-of-the-art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
arXiv Detail & Related papers (2021-04-01T08:30:53Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.