Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
- URL: http://arxiv.org/abs/2407.18129v2
- Date: Fri, 26 Jul 2024 15:34:12 GMT
- Title: Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
- Authors: Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed,
- Abstract summary: We introduce an efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions.
Dallah demonstrates state-of-the-art performance in Arabic MLLMs and excels in two benchmark tests.
Dallah has the potential to pave the way for further development of dialect-aware Arabic MLLMs.
- Score: 14.453861745003865
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advancements have significantly enhanced the capabilities of Multimodal Large Language Models (MLLMs) in generating and understanding image-to-text content. Despite these successes, progress is predominantly limited to English due to the scarcity of high quality multimodal resources in other languages. This limitation impedes the development of competitive models in languages such as Arabic. To alleviate this situation, we introduce an efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions. Dallah demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning six Arabic dialects, Dallah showcases its capability to handle complex dialectal interactions incorporating both textual and visual elements. The model excels in two benchmark tests: one evaluating its performance on Modern Standard Arabic (MSA) and another specifically designed to assess dialectal responses. Beyond its robust performance in multimodal interaction tasks, Dallah has the potential to pave the way for further development of dialect-aware Arabic MLLMs.
Related papers
- AIN: The Arabic INclusive Large Multimodal Model [71.29419186696138]
AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic.
AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities.
AIN's superior capabilities position it as a significant step toward empowering Arabic speakers with advanced multimodal generative AI tools.
arXiv Detail & Related papers (2025-01-31T18:58:20Z) - MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish [17.36441080071885]
This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesian, Malay, and Singlish.
Our approach achieves performance improvements across benchmarks in these languages, exceeding the capabilities of the official Llama-3 models.
arXiv Detail & Related papers (2024-12-21T05:50:48Z) - Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world.
One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding.
Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z) - VITA: Towards Open-Source Interactive Omni Multimodal LLM [104.52782565106033]
We introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM)
We endow the language model with visual and audio capabilities through two-stage multi-task learning.
VITA demonstrates robust foundational capabilities of multilingual, vision, and audio understanding.
arXiv Detail & Related papers (2024-08-09T17:59:49Z) - ALLaM: Large Language Models for Arabic and English [9.881560166505452]
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT)
Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English)
We show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment.
arXiv Detail & Related papers (2024-07-22T05:35:17Z) - GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning [0.0]
We introduce InstAr-500k, a new Arabic instruction dataset created by generating and collecting content.
We assess this dataset by fine-tuning an open-source Gemma-7B model on several downstream tasks to improve its functionality.
Based on multiple evaluations, our fine-tuned model achieves excellent performance on several Arabic NLP benchmarks.
arXiv Detail & Related papers (2024-07-02T10:43:49Z) - Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks [29.819766942335416]
Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension.
We introduce a comprehensive family of Arabic MLLMs, dubbed textitPeacock, with strong vision and language capabilities.
arXiv Detail & Related papers (2024-03-01T23:38:02Z) - TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild [102.93338424976959]
We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved instruction-following capabilities.
Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model.
To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models.
arXiv Detail & Related papers (2023-09-14T15:34:01Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.