Related papers: LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring

LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring

URL: http://arxiv.org/abs/2512.24235v1
Date: Tue, 30 Dec 2025 13:49:52 GMT
Title: LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring
Authors: May Bashendy, Walid Massoud, Sohaila Eltanbouly, Salam Albatarni, Marwan Sayed, Abrar Abir, Houda Bouamor, Tamer Elsayed,
Abstract summary: LAILA is the largest publicly available Arabic AES dataset to date, comprising 7,859 essays annotated with holistic and trait-specific scores on seven dimensions: relevance, organization, vocabulary, style, development, mechanics, and grammar.<n>We detail the dataset design, collection, and annotations, and provide benchmark results using state-of-the-art Arabic and English models in prompt-specific and cross-prompt settings.
Score: 7.121813878009244
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Automated Essay Scoring (AES) has gained increasing attention in recent years, yet research on Arabic AES remains limited due to the lack of publicly available datasets. To address this, we introduce LAILA, the largest publicly available Arabic AES dataset to date, comprising 7,859 essays annotated with holistic and trait-specific scores on seven dimensions: relevance, organization, vocabulary, style, development, mechanics, and grammar. We detail the dataset design, collection, and annotations, and provide benchmark results using state-of-the-art Arabic and English models in prompt-specific and cross-prompt settings. LAILA fills a critical need in Arabic AES research, supporting the development of robust scoring systems.

Related papers

Qayyem: A Real-time Platform for Scoring Proficiency of Arabic Essays [5.404427910866254]
We present Qayyem, a Web-based platform designed to support Arabic AES.<n>Qayyem provides an integrated workflow for assignment creation, batch essay upload, scoring configuration, and per-trait essay evaluation.<n>The platform deploys a number of state-of-the-art Arabic essay scoring models with different effectiveness and efficiency figures.
arXiv Detail & Related papers (2026-03-01T09:26:47Z)
Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation [0.0]
Despite its significance, Arabic faces the challenge of being under-resourced.<n>The scarcity of large annotated datasets hampers the development of accurate tools for subjectivity analysis in Arabic.<n>Recent advances in deep learning and Transformers have proven highly effective for text classification in English and French.
arXiv Detail & Related papers (2025-08-27T15:20:12Z)
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian [55.08460390792863]
EmoBench-UA is the first annotated dataset for emotion detection in Ukrainian texts.<n>Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian.
arXiv Detail & Related papers (2025-05-29T09:49:57Z)
ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation [1.3750624267664155]
Classical Arabic represents a significant era that encompasses the golden age of Arab culture, philosophy, and scientific literature.<n>We have identified a scarcity of translation datasets in Classical Arabic, which are often limited in scope and topics.<n>We present the ATHAR dataset, which comprises 66,000 high-quality classical Arabic to English translation samples.
arXiv Detail & Related papers (2024-07-29T09:45:34Z)
GemmAr: Enhancing LLMs Through Arabic Instruction-Tuning [0.0]
We introduce InstAr-500k, a new Arabic instruction dataset created by generating and collecting content. We assess this dataset by fine-tuning an open-source Gemma-7B model on several downstream tasks to improve its functionality. Based on multiple evaluations, our fine-tuned model achieves excellent performance on several Arabic NLP benchmarks.
arXiv Detail & Related papers (2024-07-02T10:43:49Z)
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic [51.13706104333848]
We explore the feasibility of repurposing an existing multilingual dataset for a new NLP task.<n>We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic.<n>We aim to help others adapt our approach for the remaining 120 BELEBELE language variants, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z)
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering [13.65056111661002]
We introduce ArabicaQA, the first large-scale dataset for machine reading comprehension and open-domain question answering in Arabic. We also present AraDPR, the first dense passage retrieval model trained on the Arabic Wikipedia corpus.
arXiv Detail & Related papers (2024-03-26T16:37:54Z)
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis [49.1574468325115]
The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020. The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches. There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
arXiv Detail & Related papers (2024-03-04T10:37:48Z)
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic [51.922112625469836]
We present datasetname, the first multi-task language understanding benchmark for the Arabic language. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models.
arXiv Detail & Related papers (2024-02-20T09:07:41Z)
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning [49.79783940841352]
Existing datasets are almost all in the English language. We work with fluent speakers of languages from around the world to collect natural instances of instructions and completions. We create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages.
arXiv Detail & Related papers (2024-02-09T18:51:49Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.