Related papers: Curriculum Script Distillation for Multilingual Visual Question Answering

Curriculum Script Distillation for Multilingual Visual Question Answering

URL: http://arxiv.org/abs/2301.07227v1
Date: Tue, 17 Jan 2023 23:55:50 GMT
Title: Curriculum Script Distillation for Multilingual Visual Question Answering
Authors: Khyathi Raghavi Chandu, Alborz Geramifard
Abstract summary: We introduce a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. We show that target languages that share the same script perform better (6%) than other languages and mixed-script code-switched languages perform better than their counterparts.
Score: 10.721189858694396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-trained models with dual and cross encoders have shown remarkable success in propelling the landscape of several tasks in vision and language in Visual Question Answering (VQA). However, since they are limited by the requirements of gold annotated data, most of these advancements do not see the light of day in other languages beyond English. We aim to address this problem by introducing a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. Experimental results demonstrate that script plays a vital role in the performance of these models. Specifically, we show that target languages that share the same script perform better (~6%) than other languages and mixed-script code-switched languages perform better than their counterparts (~5-12%).

Related papers

The Roles of English in Evaluating Multilingual Language Models [6.396057276543912]
We argue that these roles have different goals: task performance versus language understanding. We recommend to move away from this imprecise method and instead focus on furthering language understanding.
arXiv Detail & Related papers (2024-12-11T14:02:55Z)
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z)
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models [50.40191599304911]
We propose TransliCo to fine-tune an mPLM by contrasting sentences in its training data and their transliterations in a unified script. We show that Furina outperforms the original Glot500-m on various zero-shot crosslingual transfer tasks.
arXiv Detail & Related papers (2024-01-12T15:12:48Z)
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages [3.3227703089509304]
We propose a simple yet efficient approach to adapt Vision-Language Pre-training to unseen languages using MPLM. Our approach does not require image input and primarily uses machine translation, eliminating the need for target language data.
arXiv Detail & Related papers (2023-06-29T08:20:57Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years. We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data. Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z)
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation [6.381149074212897]
We propose a pipeline that utilizes English-only vision-language models to train a monolingual model for a target language. We release a large-scale visual question answering dataset in Japanese and Hindi language. Our pipeline outperforms the current state-of-the-art models by a relative increase of 4.4% and 13.4% respectively in accuracy.
arXiv Detail & Related papers (2022-06-07T14:46:30Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation [20.33235443471006]
We propose a knowledge distillation approach to extend an English language-vision model (teacher) into an equally effective multilingual and code-mixed model (student) We also create the large-scale multilingual and code-mixed VQA dataset in eleven different language setups. Experimental results and in-depth analysis show the effectiveness of the proposed VQA model over the pre-trained language-vision models on eleven diverse language setups.
arXiv Detail & Related papers (2021-09-10T03:47:29Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.