Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition
- URL: http://arxiv.org/abs/2206.11091v1
- Date: Sun, 29 May 2022 08:53:22 GMT
- Title: Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition
- Authors: Liang Zhang, Anwen Hu, Qin Jin
- Abstract summary: English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
- Score: 54.69707237195554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: English-based Vision-Language Pre-training (VLP) has achieved great success
in various downstream tasks. Some efforts have been taken to generalize this
success to non-English languages through Multilingual Vision-Language
Pre-training (M-VLP). However, due to the large number of languages, M-VLP
models often require huge computing resources and cannot be flexibly extended
to new languages. In this work, we propose a \textbf{M}ulti\textbf{L}ingual
\textbf{A}cquisition (MLA) framework that can easily generalize a monolingual
Vision-Language Pre-training model into multilingual. Specifically, we design a
lightweight language acquisition encoder based on state-of-the-art monolingual
VLP models. We further propose a two-stage training strategy to optimize the
language acquisition encoder, namely the Native Language Transfer stage and the
Language Exposure stage. With much less multilingual training data and
computing resources, our model achieves state-of-the-art performance on
multilingual image-text and video-text retrieval benchmarks.
Related papers
- Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language [7.289015788793582]
This work focuses on increasing technological participation for the S'ami language.
We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages.
We have compiled the available S'ami language resources from the web to create a clean dataset for training language models.
arXiv Detail & Related papers (2024-05-09T13:54:22Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages [3.3227703089509304]
We propose a simple yet efficient approach to adapt Vision-Language Pre-training to unseen languages using MPLM.
Our approach does not require image input and primarily uses machine translation, eliminating the need for target language data.
arXiv Detail & Related papers (2023-06-29T08:20:57Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training [52.852163987208826]
UC2 is the first machine translation-augmented framework for cross-lingual cross-modal representation learning.
We propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM)
Our proposed framework achieves new state-of-the-art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
arXiv Detail & Related papers (2021-04-01T08:30:53Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.