CVPD at QIAS 2025 Shared Task: An Efficient Encoder-Based Approach for Islamic Inheritance Reasoning
- URL: http://arxiv.org/abs/2509.00457v2
- Date: Fri, 05 Sep 2025 20:27:56 GMT
- Title: CVPD at QIAS 2025 Shared Task: An Efficient Encoder-Based Approach for Islamic Inheritance Reasoning
- Authors: Salah Eddine Bekhouche, Abdellah Zakaria Sellam, Hichem Telli, Cosimo Distante, Abdenour Hadid,
- Abstract summary: Islamic inheritance law (Ilm al-Mawarith) requires precise identification of heirs and calculation of shares.<n>We present a framework for solving inheritance questions using a specialised Arabic text encoder and Attentive Relevance Scoring (ARS)<n>The system ranks answer options according to semantic relevance, and enables fast, on-device inference without generative reasoning.
- Score: 6.5255476646093316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Islamic inheritance law (Ilm al-Mawarith) requires precise identification of heirs and calculation of shares, which poses a challenge for AI. In this paper, we present a lightweight framework for solving multiple-choice inheritance questions using a specialised Arabic text encoder and Attentive Relevance Scoring (ARS). The system ranks answer options according to semantic relevance, and enables fast, on-device inference without generative reasoning. We evaluate Arabic encoders (MARBERT, ArabicBERT, AraBERT) and compare them with API-based LLMs (Gemini, DeepSeek) on the QIAS 2025 dataset. While large models achieve an accuracy of up to 87.6%, they require more resources and are context-dependent. Our MARBERT-based approach achieves 69.87% accuracy, presenting a compelling case for efficiency, on-device deployability, and privacy. While this is lower than the 87.6% achieved by the best-performing LLM, our work quantifies a critical trade-off between the peak performance of large models and the practical advantages of smaller, specialized systems in high-stakes domains.
Related papers
- ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning [0.0]
ALPS (Arabic Linguistic & Pragmatic Suite) is a native, expert-curated diagnostic challenge set probing Deep Semantics and Pragmatics.<n> ALPS targets the depth of linguistic understanding through 531 rigorously crafted questions across 15 tasks and 47 subtasks.<n>We developed the dataset with deep expertise in Arabic linguistics, guaranteeing cultural authenticity and eliminating translation artifacts.
arXiv Detail & Related papers (2026-02-19T03:51:37Z) - Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments [14.079091139464175]
This work introduces a formal mathematical definition of the Agent Skill process, followed by a systematic evaluation of language models of varying sizes.<n>Results show that tiny models struggle with reliable skill selection, while moderately sized SLMs (approximately 12B - 30B) benefit substantially from the Agent Skill approach.
arXiv Detail & Related papers (2026-02-18T17:52:17Z) - ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding [49.67493845115009]
ELAIPBench is a benchmark curated by domain experts to evaluate large language models' comprehension of AI research papers.<n>It spans three difficulty levels and emphasizes non-trivial reasoning rather than shallow retrieval.<n>Experiments show that the best-performing LLM achieves an accuracy of only 39.95%, far below human performance.
arXiv Detail & Related papers (2025-10-12T11:11:20Z) - Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions [1.1883838320818292]
Large language models (LLMs) in hiring promise to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias.<n>We benchmark several state-of-the-art foundational LLMs and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching.<n>Our experiments show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups.
arXiv Detail & Related papers (2025-07-02T19:02:18Z) - Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective [3.2771631221674333]
We leverage task-specific data augmentations throughout the training, generation, and scoring phases.<n>We employ a depth-first search algorithm to generate diverse, high-probability candidate solutions.<n>Our method achieves a score of 71.6% (286.5/400 solved tasks) on the public ARC-AGI evaluation set.
arXiv Detail & Related papers (2025-05-08T11:17:10Z) - MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? [64.62421656031128]
MLRC-Bench is a benchmark designed to quantify how effectively language agents can tackle challenging Machine Learning (ML) Research Competitions.<n>Unlike prior work, MLRC-Bench measures the key steps of proposing and implementing novel research methods.<n>Even the best-performing tested agent closes only 9.3% of the gap between baseline and top human participant scores.
arXiv Detail & Related papers (2025-04-13T19:35:43Z) - SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines [118.8024915014751]
Large language models (LLMs) have demonstrated remarkable proficiency in academic disciplines such as mathematics, physics, and computer science.<n>However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks.<n>We present SuperGPQA, a benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines.
arXiv Detail & Related papers (2025-02-20T17:05:58Z) - How well can LLMs Grade Essays in Arabic? [3.101490720236325]
This research assesses the effectiveness of large language models (LLMs) in the task of Arabic automated essay scoring (AES) using the AR-AES dataset.<n>It explores various evaluation methodologies, including zero-shot, few-shot in-context learning, and fine-tuning.<n>A mixed-language prompting strategy, integrating English prompts with Arabic content, was implemented to improve model comprehension and performance.
arXiv Detail & Related papers (2025-01-27T21:30:02Z) - Can Large Language Models Predict the Outcome of Judicial Decisions? [0.0]
Large Language Models (LLMs) have shown exceptional capabilities in Natural Language Processing (NLP)<n>We benchmark state-of-the-art open-source LLMs, including LLaMA-3.2-3B and LLaMA-3.1-8B, under varying configurations.<n>Our results demonstrate that fine-tuned smaller models achieve comparable performance to larger models in task-specific contexts.
arXiv Detail & Related papers (2025-01-15T11:32:35Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs [70.54226917774933]
We propose the DecompositionAlignment-Reasoning Agent (DARA) framework.
DARA effectively parses questions into formal queries through a dual mechanism.
We show that DARA attains performance comparable to state-of-the-art enumerating-and-ranking-based methods for KGQA.
arXiv Detail & Related papers (2024-06-11T09:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.