Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation
- URL: http://arxiv.org/abs/2509.01081v2
- Date: Wed, 17 Sep 2025 06:42:45 GMT
- Title: Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation
- Authors: Abdessalam Bouchekif, Samer Rashwani, Heba Sbahi, Shahd Gaben, Mutaz Al-Khatib, Mohammed Ghaly,
- Abstract summary: o3 and Gemini 2.5 achieved accuracies above 90%, whereas ALLaM, Fanar, LLaMA, and Mistral scored below 50%.<n>We conduct a detailed error analysis to identify recurring failure patterns across models.<n>Our findings highlight limitations in handling structured legal reasoning and suggest directions for improving performance in Islamic legal reasoning.
- Score: 0.17592522344393483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper evaluates the knowledge and reasoning capabilities of Large Language Models in Islamic inheritance law, known as 'ilm al-mawarith. We assess the performance of seven LLMs using a benchmark of 1,000 multiple-choice questions covering diverse inheritance scenarios, designed to test models' ability to understand the inheritance context and compute the distribution of shares prescribed by Islamic jurisprudence. The results reveal a significant performance gap: o3 and Gemini 2.5 achieved accuracies above 90%, whereas ALLaM, Fanar, LLaMA, and Mistral scored below 50%. These disparities reflect important differences in reasoning ability and domain adaptation. We conduct a detailed error analysis to identify recurring failure patterns across models, including misunderstandings of inheritance scenarios, incorrect application of legal rules, and insufficient domain knowledge. Our findings highlight limitations in handling structured legal reasoning and suggest directions for improving performance in Islamic legal reasoning. Code: https://github.com/bouchekif/inheritance_evaluation
Related papers
- IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions [1.3052252174353483]
IslamicLegalBench is the first benchmark evaluating LLMs across seven schools of Islamic jurisprudence.<n>Best model achieves only 68% correctness with 21% hallucination.<n>Few-shot prompting provides minimal gains, improving only 2 of 9 models by >1%.
arXiv Detail & Related papers (2026-02-02T10:30:59Z) - HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment [52.374772443536045]
HALF (Harm-Aware LLM Fairness) is a framework that assesses model bias in realistic applications and weighs the outcomes by harm severity.<n>We show that HALF exposes a clear gap between previous benchmarking success and deployment readiness.
arXiv Detail & Related papers (2025-10-14T07:13:26Z) - SoK: Large Language Model Copyright Auditing via Fingerprinting [69.14570598973195]
We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches.<n>We propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios.
arXiv Detail & Related papers (2025-08-27T12:56:57Z) - QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning [1.0152838128195467]
We fine-tuned the Fanar-1-9B causal language model using Low-Rank Adaptation (LoRA) and integrated it into a Retrieval-Augmented Generation pipeline.<n>Our system achieves an accuracy of 0.858 in the final test, outperforming other competitive models such as, GPT 4.5, LLaMA, Fanar, Mistral and ALLaM evaluated with zero-shot prompting.
arXiv Detail & Related papers (2025-08-20T10:29:55Z) - Benchmarking the Legal Reasoning of LLMs in Arabic Islamic Inheritance Cases [1.3521447196536418]
Islamic inheritance domain holds significant importance for Muslims to ensure fair distribution of shares between heirs.<n>Recent advancements in Large Language Models (LLMs) have sparked interest in their potential to assist with complex legal reasoning tasks.<n>This study evaluates the reasoning capabilities of state-of-the-art LLMs to interpret and apply Islamic inheritance laws.
arXiv Detail & Related papers (2025-08-13T10:37:58Z) - LEXam: Benchmarking Legal Reasoning on 340 Law Exams [61.344330783528015]
LEXam is a novel benchmark derived from 340 law exams spanning 116 law school courses across a range of subjects and degree levels.<n>The dataset comprises 4,886 law exam questions in English and German, including 2,841 long-form, open-ended questions and 2,045 multiple-choice questions.
arXiv Detail & Related papers (2025-05-19T08:48:12Z) - J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain [12.550611136062722]
We propose a method of legal knowledge injection attacks for robustness testing.<n>The aim of the framework is to explore whether LLMs perform deductive reasoning when accomplishing legal tasks.<n>We have collected mistakes that legal experts might make in judicial decisions in the real world.
arXiv Detail & Related papers (2025-03-24T05:42:05Z) - Evaluating the Correctness of Inference Patterns Used by LLMs for Judgment [53.17596274334017]
We evaluate the correctness of the detailed inference patterns of an LLM behind its seemingly correct outputs.<n>Experiments show that even when the language generation results appear correct, a significant portion of the inference patterns used by the LLM for the legal judgment may represent misleading or irrelevant logic.
arXiv Detail & Related papers (2024-10-06T08:33:39Z) - The Factuality of Large Language Models in the Legal Domain [8.111302195052641]
This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain.
We design a dataset of diverse factual questions about case law and legislation.
We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching.
arXiv Detail & Related papers (2024-09-18T08:30:20Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - Knowledge is Power: Understanding Causality Makes Legal judgment
Prediction Models More Generalizable and Robust [3.555105847974074]
Legal Judgment Prediction (LJP) serves as legal assistance to mitigate the great work burden of limited legal practitioners.
Most existing methods apply various large-scale pre-trained language models finetuned in LJP tasks to obtain consistent improvements.
We discover that the state-of-the-art (SOTA) model makes judgment predictions according to irrelevant (or non-casual) information.
arXiv Detail & Related papers (2022-11-06T07:03:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.