HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
- URL: http://arxiv.org/abs/2512.01659v1
- Date: Mon, 01 Dec 2025 13:31:06 GMT
- Title: HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
- Authors: Valentin Noël, Elimane Yassine Seidou, Charly Ken Capo-Chichi, Ghanem Amari,
- Abstract summary: We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response.<n>Our approach produces bounded, interpretable metrics into textitEntity Grounding (EG), measuring whether entities in the response appear in source documents, and textitRelation Preservation (RP), verifying that asserted relationships are supported by context.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioners need verifiable guarantees that generated text faithfully represents source documents. Existing hallucination detectors rely on semantic similarity metrics that tolerate entity substitutions, a dangerous failure mode when confusing parties, dates, or legal provisions can have material consequences. We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response. Our approach produces bounded, interpretable metrics decomposed into \textit{Entity Grounding} (EG), measuring whether entities in the response appear in source documents, and \textit{Relation Preservation} (RP), verifying that asserted relationships are supported by context. On structured control documents, HalluGraph achieves near-perfect discrimination ($>$400 words, $>$20 entities), HalluGraph achieves $AUC = 0.979$, while maintaining robust performance ($AUC \approx 0.89$) on challenging generative legal task, consistently outperforming semantic similarity baselines. The framework provides the transparency and traceability required for high-stakes legal applications, enabling full audit trails from generated assertions back to source passages.
Related papers
- Orchestrating Specialized Agents for Trustworthy Enterprise RAG [8.772844442593975]
One-pass retrieval-and-write pipelines often yield shallow summaries.<n>We introduce ADORE, an agentic framework that replaces linear retrieval with iterative, user-steered investigation.<n>Our contributions are threefold: Memory-locked synthesis, Evidence-coverage-guided execution, and section-packed long-context grounding.
arXiv Detail & Related papers (2026-01-26T08:48:41Z) - Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance [23.470768802111007]
$textbfRebuttalAgent$ is a new multi-agents framework that reframes rebuttal generation as an evidence-centric planning task.<n>Our system decomposes complex feedback into atomic concerns and dynamically constructs hybrid contexts.<n>By generating an inspectable response plan before drafting, $textbfRebuttalAgent$ ensures that every argument is explicitly anchored in internal or external evidence.
arXiv Detail & Related papers (2026-01-20T17:23:51Z) - Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images [96.43608872116347]
AnomReason is a large-scale benchmark with structured annotations as quadruple textbfAnomAgent<n>AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images.
arXiv Detail & Related papers (2025-10-11T14:09:24Z) - Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs [0.0]
Audit the Whisper is a conference-grade research artifact that spans theory, benchmark design, detection, and validation.<n>Our contributions are: (i) a channel-capacity analysis showing how interventions such as paraphrase, rate limiting, and role permutation impose quantifiable capacity penalties-operationalised via paired-run Kullback-Leibler diagnostics.<n>We release anonymized regeneration scripts, anonymized manifests, and documentation so external auditors can reproduce every figure, satisfy double-blind requirements, and extend the framework with minimal effort.
arXiv Detail & Related papers (2025-10-05T17:51:52Z) - Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z) - All for law and law for all: Adaptive RAG Pipeline for Legal Research [0.8819595592190884]
Retrieval-Augmented Generation (RAG) has transformed how we approach text generation tasks.<n>This work introduces a novel end-to-end RAG pipeline that improves upon previous baselines.
arXiv Detail & Related papers (2025-08-18T17:14:03Z) - AGENTICT$^2$S:Robust Text-to-SPARQL via Agentic Collaborative Reasoning over Heterogeneous Knowledge Graphs for the Circular Economy [42.73610751710192]
AgenticT$2$S is a framework that decomposes knowledge graphs into subtasks managed by specialized agents.<n>A two-stage verifier detects structurally invalid and semantically underspecified queries.<n>Experiments on real-world circular economy KGs demonstrate that AgenticT$2$S improves execution accuracy by 17.3%.
arXiv Detail & Related papers (2025-08-03T15:58:54Z) - Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation [69.45495166424642]
We develop a robust and discriminative QA benchmark to measure temporal, causal, and character consistency understanding in narrative documents.<n>We then introduce Entity-Event RAG (E2RAG), a dual-graph framework that keeps separate entity and event subgraphs linked by a bipartite mapping.<n>Across ChronoQA, our approach outperforms state-of-the-art unstructured and KG-based RAG baselines, with notable gains on causal and character consistency queries.
arXiv Detail & Related papers (2025-06-06T10:07:21Z) - Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation [3.864321514889099]
VeGraph operates in three phases: (1) Graph Representation - an input claim is decomposed into structured triplets, forming a graph-based representation that integrates both structured and unstructured information; (2) Entity Disambiguation -VeGraph iteratively interacts with the knowledge base to resolve ambiguous entities within the graph for deeper sub-claim verification; and (3) Verification - remaining triplets are verified to complete the fact-checking process.
arXiv Detail & Related papers (2025-05-29T02:02:55Z) - An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach [0.0]
RAG systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers.<n>This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms.
arXiv Detail & Related papers (2025-04-29T18:36:57Z) - A Label-Free Heterophily-Guided Approach for Unsupervised Graph Fraud Detection [60.09453163562244]
We propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD.<n>In the estimation module, we design a novel label-free heterophily metric called HALO, which captures the critical graph properties for GFD.<n>In the alignment-based fraud detection module, we develop a joint-GNN architecture with ranking loss and asymmetric alignment loss.
arXiv Detail & Related papers (2025-02-18T22:07:36Z) - FactGraph: Evaluating Factuality in Summarization with Semantic Graph
Representations [114.94628499698096]
We propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MRs)
MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity.
Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%.
arXiv Detail & Related papers (2022-04-13T16:45:33Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.