Related papers: DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

URL: http://arxiv.org/abs/2601.19898v1
Date: Tue, 27 Jan 2026 18:59:19 GMT
Title: DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
Authors: Shubham Patle, Sara Ghaboura, Hania Tariq, Mohammad Usman Khan, Omkar Thawakar, Rao Muhammad Anwer, Salman Khan,
Abstract summary: We present DuwatBench, a benchmark of 1,272 curated samples containing about 1,475 unique words across six classical and modern calligraphic styles.<n>The dataset reflects real-world challenges in Arabic writing, such as complex stroke patterns, dense ligatures, and stylistic variations.<n>Using DuwatBench, we evaluated 13 leading Arabic and multilingual multimodal models and showed that while they perform well on clean text, they struggle with calligraphic variation, artistic distortions, and precise visual-text alignment.
Score: 32.85312741808662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. Although multimodal models have advanced across languages, their ability to process Arabic script, especially in artistic and stylized calligraphic forms, remains largely unexplored. To address this gap, we present DuwatBench, a benchmark of 1,272 curated samples containing about 1,475 unique words across six classical and modern calligraphic styles, each paired with sentence-level detection annotations. The dataset reflects real-world challenges in Arabic writing, such as complex stroke patterns, dense ligatures, and stylistic variations that often challenge standard text recognition systems. Using DuwatBench, we evaluated 13 leading Arabic and multilingual multimodal models and showed that while they perform well on clean text, they struggle with calligraphic variation, artistic distortions, and precise visual-text alignment. By publicly releasing DuwatBench and its annotations, we aim to advance culturally grounded multimodal research, foster fair inclusion of the Arabic language and visual heritage in AI systems, and support continued progress in this area. Our dataset (https://huggingface.co/datasets/MBZUAI/DuwatBench) and evaluation suit (https://github.com/mbzuai-oryx/DuwatBench) are publicly available.

Related papers

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation [1.817669530501506]
Arabic dialects have long been under-represented in Natural Language Processing (NLP) research.<n>Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap.<n>This paper presents Aladdin-FTI, our submission to the AMIYA shared task.
arXiv Detail & Related papers (2026-02-18T09:15:20Z)
ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics [0.6323908398583084]
We introduce ADAB (Arabic Politeness dataset), a new annotated Arabic dataset collected from four online platforms.<n>The dataset was annotated based on Arabic linguistic traditions and pragmatic theory, resulting in three classes: polite, impolite, and neutral.<n>It contains 10,000 samples with linguistic feature annotations across 16 politeness categories and achieves substantial inter-annotator agreement.
arXiv Detail & Related papers (2026-02-14T19:58:53Z)
Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench) MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z)
Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space [2.9914612342004503]
We train a bilingual Arabic-Hebrew language model using a transliterated version of Arabic texts in Hebrew. We assess the performance of a language model that employs a unified script for both languages, on machine translation.
arXiv Detail & Related papers (2024-02-25T11:26:39Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z)
Graphemic Normalization of the Perso-Arabic Script [47.429213930688086]
This paper documents the challenges that Perso-Arabic presents beyond the best-documented languages. We focus on the situation in natural language processing (NLP), which is affected by multiple, often neglected, issues. We evaluate the effects of script normalization on eight languages from diverse language families in the Perso-Arabic script diaspora on machine translation and statistical language modeling tasks.
arXiv Detail & Related papers (2022-10-21T21:59:44Z)
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images. "vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora. Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z)
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey [0.0]
Arabic dialect identification is a specific task of natural language processing, aiming to automatically predict the Arabic dialect of a given text. In this paper, we present a comprehensive survey of Arabic dialect identification research in written texts. We review the traditional machine learning methods, deep learning architectures, and complex learning approaches to Arabic dialect identification.
arXiv Detail & Related papers (2020-09-26T15:33:16Z)
A Hybrid Deep Learning Model for Arabic Text Recognition [2.064612766965483]
This paper presents a model that can recognize Arabic text that was printed using multiple font types. The proposed model employs a hybrid DL network that can recognize Arabic printed text without the need for character segmentation. The model achieved good results in recognizing characters and words and it also achieved promising results in recognizing characters when it was tested on unseen data.
arXiv Detail & Related papers (2020-09-04T02:49:17Z)
AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss [7.734726150561088]
We propose a novel end-to-end Arabic document classification framework, Arabic document image-based classifier (AraDIC) AraDIC consists of an image-based character encoder and a classifier. They are trained in an end-to-end fashion using the class balanced loss to deal with the long-tailed data distribution problem. To the best of our knowledge, this is the first image-based character embedding framework addressing the problem of Arabic text classification.
arXiv Detail & Related papers (2020-06-20T14:25:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.