Related papers: Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics

Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics

URL: http://arxiv.org/abs/2511.01668v1
Date: Mon, 03 Nov 2025 15:30:58 GMT
Title: Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics
Authors: Yueqing Xi, Yifan Bai, Huasen Luo, Weiliang Wen, Hui Liu, Haoliang Li,
Abstract summary: We present a hybrid legal QA agent tailored for judicial settings.<n>It integrates retrieval-augmented generation (RAG) with multi-model ensembling to deliver reliable, auditable, and continuously updatable counsel.
Score: 30.232667436008978
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As artificial intelligence permeates judicial forensics, ensuring the veracity and traceability of legal question answering (QA) has become critical. Conventional large language models (LLMs) are prone to hallucination, risking misleading guidance in legal consultation, while static knowledge bases struggle to keep pace with frequently updated statutes and case law. We present a hybrid legal QA agent tailored for judicial settings that integrates retrieval-augmented generation (RAG) with multi-model ensembling to deliver reliable, auditable, and continuously updatable counsel. The system prioritizes retrieval over generation: when a trusted legal repository yields relevant evidence, answers are produced via RAG; otherwise, multiple LLMs generate candidates that are scored by a specialized selector, with the top-ranked answer returned. High-quality outputs then undergo human review before being written back to the repository, enabling dynamic knowledge evolution and provenance tracking. Experiments on the Law\_QA dataset show that our hybrid approach significantly outperforms both a single-model baseline and a vanilla RAG pipeline on F1, ROUGE-L, and an LLM-as-a-Judge metric. Ablations confirm the complementary contributions of retrieval prioritization, model ensembling, and the human-in-the-loop update mechanism. The proposed system demonstrably reduces hallucination while improving answer quality and legal compliance, advancing the practical landing of media forensics technologies in judicial scenarios.

Related papers

LegalOne: A Family of Foundation Models for Reliable Legal Reasoning [54.57434222018289]
We present LegalOne, a family of foundational models specifically tailored for the Chinese legal domain.<n>LegalOne is developed through a comprehensive three-phase pipeline designed to master legal reasoning.<n>We publicly release the LegalOne weights and the LegalKit evaluation framework to advance the field of Legal AI.
arXiv Detail & Related papers (2026-01-31T10:18:32Z)
LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval [10.997604609194033]
Statute retrieval is essential for legal assistance and judicial decision support.<n>Real-world legal queries are often implicit, multi-issue, and expressed in colloquial or underspecified forms.<n>We present LegalMALR, a retrieval framework that integrates a Multi-Agent Query Understanding System with a zero-shot large-language-generated reranking module.
arXiv Detail & Related papers (2026-01-25T04:44:56Z)
AppellateGen: A Benchmark for Appellate Legal Judgment Generation [30.9030336647868]
We introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs.<n>The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates.<n>We propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting.
arXiv Detail & Related papers (2026-01-04T02:15:17Z)
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z)
L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search [3.662162441273026]
L-MARS (Legal Multi-Agent with Orchestrated Reasoning and Agentic Search) is a system that reduces hallucination and uncertainty in legal question answering.<n>Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems.<n>It employs a Judge Agent to verify sufficiency, jurisdiction, and temporal validity before answer synthesis.
arXiv Detail & Related papers (2025-08-31T09:23:26Z)
Scaling Legal AI: Benchmarking Mamba and Transformers for Statutory Classification and Case Law Retrieval [0.0]
We present the first comprehensive benchmarking of Mamba, a state-space model with linear-time selective mechanisms, against leading transformer models for statutory classification and case law retrieval.<n>Results show that Mamba's linear scaling enables processing of legal documents several times longer than transformers.<n>Our findings highlight trade-offs between state-space models and transformers, providing guidance for deploying legal AI in statutory analysis, judicial decision support, and policy research.
arXiv Detail & Related papers (2025-08-29T17:38:47Z)
Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries [3.552993426200889]
TraceRetriever mirrors real-world legal search by operating with limited case information.<n>Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion.<n> Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments.
arXiv Detail & Related papers (2025-08-01T14:49:33Z)
Augmented Question-guided Retrieval (AQgR) of Indian Case Law with LLM, RAG, and Structured Summaries [0.0]
This paper proposes the use of Large Language Models (LLMs) to facilitate the retrieval of relevant cases.<n>Our approach combines Retrieval Augmented Generation (RAG) with structured summaries optimized for Indian case law.<n>The system generates targeted legal questions based on factual scenarios to identify relevant case law more effectively.
arXiv Detail & Related papers (2025-07-23T05:24:44Z)
Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation [108.13261761812517]
We introduce FRANQ (Faithfulness-based Retrieval Augmented UNcertainty Quantification), a novel method for hallucination detection in RAG outputs.<n>We present a new long-form Question Answering (QA) dataset annotated for both factuality and faithfulness.
arXiv Detail & Related papers (2025-05-27T11:56:59Z)
Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study [9.30538764385435]
Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored.<n>We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations.<n>We then conduct a systematic benchmarking across a range of solutions.<n>Results show that neither general nor law-specific LLMs suffice as stand-alone solutions, with performance near zero.
arXiv Detail & Related papers (2024-12-09T07:46:14Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.