Improving the Cross-Lingual Generalisation in Visual Question Answering
- URL: http://arxiv.org/abs/2209.02982v1
- Date: Wed, 7 Sep 2022 08:07:43 GMT
- Title: Improving the Cross-Lingual Generalisation in Visual Question Answering
- Authors: Farhad Nooralahzadeh, Rico Sennrich
- Abstract summary: multilingual vision-language pretrained models show poor cross-lingual generalisation when applied to non-English data.
In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task.
We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, and (3) we augment training examples using synthetic code
- Score: 40.86774711775718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While several benefits were realized for multilingual vision-language
pretrained models, recent benchmarks across various tasks and languages showed
poor cross-lingual generalisation when multilingually pre-trained
vision-language models are applied to non-English data, with a large gap
between (supervised) English performance and (zero-shot) cross-lingual
transfer. In this work, we explore the poor performance of these models on a
zero-shot cross-lingual visual question answering (VQA) task, where models are
fine-tuned on English visual-question data and evaluated on 7 typologically
diverse languages. We improve cross-lingual transfer with three strategies: (1)
we introduce a linguistic prior objective to augment the cross-entropy loss
with a similarity-based loss to guide the model during training, (2) we learn a
task-specific subnetwork that improves cross-lingual generalisation and reduces
variance without model modification, (3) we augment training examples using
synthetic code-mixing to promote alignment of embeddings between source and
target languages. Our experiments on xGQA using the pretrained multilingual
multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of
the proposed fine-tuning strategy for 7 languages, outperforming existing
transfer methods with sparse models. Code and data to reproduce our findings
are publicly available.
Related papers
- xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.
We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z) - Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference.
We propose an adaptive and sparse architecture for multilingual modeling.
Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.