PALO: A Polyglot Large Multimodal Model for 5B People
- URL: http://arxiv.org/abs/2402.14818v2
- Date: Tue, 5 Mar 2024 11:22:07 GMT
- Title: PALO: A Polyglot Large Multimodal Model for 5B People
- Authors: Muhammad Maaz, Hanoona Rasheed, Abdelrahman Shaker, Salman Khan,
Hisham Cholakal, Rao M. Anwer, Tim Baldwin, Michael Felsberg, Fahad S. Khan
- Abstract summary: In pursuit of more inclusive Vision-Language Models (VLMs), this study introduces a Large Multimodal Model called PALO.
PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese.
- Score: 23.839200109033538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In pursuit of more inclusive Vision-Language Models (VLMs), this study
introduces a Large Multilingual Multimodal Model called PALO. PALO offers
visual reasoning capabilities in 10 major languages, including English,
Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese,
that span a total of ~5B people (65% of the world population). Our approach
involves a semi-automated translation approach to adapt the multimodal
instruction dataset from English to the target languages using a fine-tuned
Large Language Model, thereby ensuring high linguistic fidelity while allowing
scalability due to minimal manual effort. The incorporation of diverse
instruction sets helps us boost overall performance across multiple languages
especially those that are underrepresented like Hindi, Arabic, Bengali, and
Urdu. The resulting models are trained across three scales (1.7B, 7B and 13B
parameters) to show the generalization and scalability where we observe
substantial improvements compared to strong baselines. We also propose the
first multilingual multimodal benchmark for the forthcoming approaches to
evaluate their vision-language reasoning capabilities across languages. Code:
https://github.com/mbzuai-oryx/PALO.
Related papers
- Poro 34B and the Blessing of Multilinguality [3.270981284471548]
Poro 34B is a 34 billion parameter model trained for 1 trillion tokens of Finnish, English, and programming languages.
We show that a multilingual training approach can produce a model that substantially advances over the capabilities of existing models for Finnish.
arXiv Detail & Related papers (2024-04-02T11:34:12Z) - Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model [33.87586041774359]
Aya is a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced.
We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages.
We conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models.
arXiv Detail & Related papers (2024-02-12T17:34:13Z) - Baichuan 2: Open Large-scale Language Models [51.56361715162972]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.
Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z) - Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages [76.35234803589412]
MPM is an effective training paradigm for training large multimodal models in non-English languages.
We build large multimodal models VisCPM in image-to-text and text-to-image generation, which achieve state-of-the-art (open-source) performance in Chinese.
arXiv Detail & Related papers (2023-08-23T09:55:41Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.