Related papers: !MSA at BAREC Shared Task 2025: Ensembling Arabic Transformers for Readability Assessment

!MSA at BAREC Shared Task 2025: Ensembling Arabic Transformers for Readability Assessment

URL: http://arxiv.org/abs/2509.10040v1
Date: Fri, 12 Sep 2025 08:08:45 GMT
Title: !MSA at BAREC Shared Task 2025: Ensembling Arabic Transformers for Readability Assessment
Authors: Mohamed Basem, Mohamed Younes, Seif Ahmed, Abdelrahman Moustafa,
Abstract summary: We present MSAs winning system for the BAREC 2025 Shared Task on fine-grained Arabic readability assessment.<n>Our approach is a confidence-weighted ensemble of four complementary transformer models.<n>System reached 87.5 percent QWK at the sentence level and 87.4 percent at the document level.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present MSAs winning system for the BAREC 2025 Shared Task on fine-grained Arabic readability assessment, achieving first place in six of six tracks. Our approach is a confidence-weighted ensemble of four complementary transformer models (AraBERTv2, AraELECTRA, MARBERT, and CAMeLBERT) each fine-tuned with distinct loss functions to capture diverse readability signals. To tackle severe class imbalance and data scarcity, we applied weighted training, advanced preprocessing, SAMER corpus relabeling with our strongest model, and synthetic data generation via Gemini 2.5 Flash, adding about 10,000 rare-level samples. A targeted post-processing step corrected prediction distribution skew, delivering a 6.3 percent Quadratic Weighted Kappa (QWK) gain. Our system reached 87.5 percent QWK at the sentence level and 87.4 percent at the document level, demonstrating the power of model and loss diversity, confidence-informed fusion, and intelligent augmentation for robust Arabic readability prediction.

Related papers

SmallML: Bayesian Transfer Learning for Small-Data Predictive Analytics [0.0]
SmallML achieves enterprise-level prediction accuracy with datasets as small as 50-200 observations.<n> validation on customer churn data demonstrates 96.7% +/- 4.2% AUC with 100 observations per business.<n>By enabling enterprise-grade predictions for 33 million U.S. SMEs, SmallML addresses a critical gap in AI democratization.
arXiv Detail & Related papers (2025-11-18T02:00:55Z)
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data [55.65426108082807]
We build Uni-MoE-2.0- Omni from scratch through three core contributions.<n>It is capable of omnimodal understanding, as well as generating images, text, and speech.
arXiv Detail & Related papers (2025-11-16T14:10:55Z)
Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training [0.0]
This work presents a systematic investigation of custom convolutional neural network architectures for satellite land use classification.<n>We achieve 97.23% test accuracy on the EuroSAT dataset without reliance on pre-trained models.<n>Our approach achieves performance within 1.34% of fine-tuned ResNet-50 (98.57%) while requiring no external data.
arXiv Detail & Related papers (2025-10-17T10:59:24Z)
mucAI at BAREC Shared Task 2025: Towards Uncertainty Aware Arabic Readability Assessment [0.0]
We present a model-agnostic technique for fine-grained Arabic readability classification in the BAREC 2025 Shared Task.<n>Our method applies conformal prediction to generate prediction sets with coverage guarantees, then computes weighted averages using softmax-renormalized probabilities over the conformal sets.<n>This uncertainty-aware decoding improves Quadratic Weighted Kappa (QWK) by reducing high-penalty misclassifications to nearer levels.
arXiv Detail & Related papers (2025-09-18T23:14:51Z)
A Confidence-Diversity Framework for Calibrating AI Judgement in Accessible Qualitative Coding Tasks [0.0]
Confidence-diversity calibration is a quality assessment framework for accessible coding tasks.<n>Analysing 5,680 coding decisions from eight state-of-the-art LLMs, we find that mean self-confidence tracks inter-model agreement closely.
arXiv Detail & Related papers (2025-08-04T03:47:10Z)
Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation [22.369277951685234]
This paper presents two core contributions to advancing DA-MSA translation for the Levantine, Egyptian, and Gulf dialects.<n>Few-shot prompting consistently outperformed zero-shot, chain-of-thought, and our proposed Ara-TEaR method.<n>For fine-tuning LLMs, a quantized Gemma2-9B model achieved a chrF++ score of 49.88, outperforming zero-shot GPT-4o (44.58)
arXiv Detail & Related papers (2025-07-27T14:37:53Z)
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin [56.37346003683629]
Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention.<n>A major obstacle is that the pseudolabels generated by VLMs tend to be imbalanced, leading to inferior performance.<n>We propose a novel framework incorporating concept alignment and confusion-aware margin mechanisms.
arXiv Detail & Related papers (2025-05-04T10:24:34Z)
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition [65.19303535139453]
We present our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition. Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples. For the MER-OV track, our utilization of Emotion-LLaMA for open-vocabulary annotation yields an 8.52% improvement in average accuracy and recall compared to GPT-4V.
arXiv Detail & Related papers (2024-08-20T02:46:03Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts [54.07541591018305]
We present MAD-Bench, a benchmark that contains 1000 test samples divided into 5 categories, such as non-existent objects, count of objects, and spatial relationship. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4v, Reka, Gemini-Pro, to open-sourced models, such as LLaVA-NeXT and MiniCPM-Llama3. While GPT-4o achieves 82.82% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 9% to 50%.
arXiv Detail & Related papers (2024-02-20T18:31:27Z)
TCE at Qur'an QA 2023 Shared Task: Low Resource Enhanced Transformer-based Ensemble Approach for Qur'anic QA [0.0]
We present our approach to tackle Qur'an QA 2023 shared tasks A and B. To address the challenge of low-resourced training data, we rely on transfer learning together with a voting ensemble. We employ different architectures and learning mechanisms for a range of Arabic pre-trained transformer-based models for both tasks.
arXiv Detail & Related papers (2024-01-23T19:32:54Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.