Augmented Question-guided Retrieval (AQgR) of Indian Case Law with LLM, RAG, and Structured Summaries
- URL: http://arxiv.org/abs/2508.04710v1
- Date: Wed, 23 Jul 2025 05:24:44 GMT
- Title: Augmented Question-guided Retrieval (AQgR) of Indian Case Law with LLM, RAG, and Structured Summaries
- Authors: Vishnuprabha V, Daleesha M Viswanathan, Rajesh R, Aneesh V Pillai,
- Abstract summary: This paper proposes the use of Large Language Models (LLMs) to facilitate the retrieval of relevant cases.<n>Our approach combines Retrieval Augmented Generation (RAG) with structured summaries optimized for Indian case law.<n>The system generates targeted legal questions based on factual scenarios to identify relevant case law more effectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying relevant legal precedents remains challenging, as most retrieval methods emphasize factual similarity over legal issues, and current systems often lack explanations clarifying case relevance. This paper proposes the use of Large Language Models (LLMs) to address this gap by facilitating the retrieval of relevant cases, generating explanations to elucidate relevance, and identifying core legal issues all autonomously, without requiring legal expertise. Our approach combines Retrieval Augmented Generation (RAG) with structured summaries optimized for Indian case law. Leveraging the Augmented Question-guided Retrieval (AQgR) framework, the system generates targeted legal questions based on factual scenarios to identify relevant case law more effectively. The structured summaries were assessed manually by legal experts, given the absence of a suitable structured summary dataset. Case law retrieval was evaluated using the FIRE dataset, and explanations were reviewed by legal experts, as explanation generation alongside case retrieval is an emerging innovation. Experimental evaluation on a subset of the FIRE 2019 dataset yielded promising outcomes, achieving a Mean Average Precision (MAP) score of 0.36 and a Mean Average Recall (MAR) of 0.67 across test queries, significantly surpassing the current MAP benchmark of 0.1573. This work introduces a suite of novel contributions to advance case law retrieval. By transitioning from fact-based to legal-issue-based retrieval, the proposed approach delivers more contextually relevant results that align closely with legal professionals' needs. Integrating legal questions within the retrieval process through the AQgR framework ensures more precise and meaningful retrieval by refining the context of queries.
Related papers
- LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z) - LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval [10.997604609194033]
Statute retrieval is essential for legal assistance and judicial decision support.<n>Real-world legal queries are often implicit, multi-issue, and expressed in colloquial or underspecified forms.<n>We present LegalMALR, a retrieval framework that integrates a Multi-Agent Query Understanding System with a zero-shot large-language-generated reranking module.
arXiv Detail & Related papers (2026-01-25T04:44:56Z) - ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation [56.79698529022327]
Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution.<n>This paper explores the problem of legal claim generation based on the given case's facts.<n>We construct ClaimGen-CN, the first dataset for Chinese legal claim generation task.
arXiv Detail & Related papers (2025-08-24T07:19:25Z) - A Reasoning-Focused Legal Retrieval Benchmark [28.607778538115642]
We introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA.<n>Our results suggest that legal RAG remains a challenging application, thus motivating future research.
arXiv Detail & Related papers (2025-05-06T20:44:03Z) - A Reproducibility Study of Graph-Based Legal Case Retrieval [1.6819960041696331]
CaseLink is a graph-based method for legal case retrieval.<n>CaseLink captures higher-order relationships of cases going beyond the stand-alone level of documents.<n>Challenges in reproducing novel results have recently been highlighted.
arXiv Detail & Related papers (2025-04-11T10:04:12Z) - A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience.<n>Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision.<n>This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z) - Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study [9.30538764385435]
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored.<n>We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations.<n>We then conduct a systematic benchmarking across a range of solutions.<n>Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero.
arXiv Detail & Related papers (2024-12-09T07:46:14Z) - JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs [67.54302101989542]
Legal case retrieval aims to provide similar cases as references for a given fact description.
Existing works mainly focus on case-to-case retrieval using lengthy queries.
Data scale is insufficient to satisfy the training requirements of existing data-hungry neural models.
arXiv Detail & Related papers (2024-10-09T06:26:39Z) - LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain.
LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP)
We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval [16.29803062332164]
We propose a few-shot approach where large language models assist in generating expert-aligned relevance judgments.<n>The proposed approach decomposes the judgment process into several stages, mimicking the workflow of human annotators.<n>It also ensures interpretable data labeling, providing transparency and clarity in the relevance assessment process.
arXiv Detail & Related papers (2024-03-27T09:46:56Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.