Related papers: Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs

Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs

URL: http://arxiv.org/abs/2512.00663v1
Date: Sat, 29 Nov 2025 23:09:15 GMT
Title: Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs
Authors: Tanmay Agrawal,
Abstract summary: This paper introduces a framework that organizes proprietary knowledge and model-generated content into interactive visual knowledge graphs.<n>Users can diagnose inconsistencies, identify weak reasoning chains, and supply corrective feedback.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models have rapidly advanced in their ability to interpret and generate natural language. In enterprise settings, they are frequently augmented with closed-source domain knowledge to deliver more contextually informed responses. However, operational constraints such as limited context windows and inconsistencies between pre-training data and supplied knowledge often lead to hallucinations, some of which appear highly credible and escape routine human review. Current mitigation strategies either depend on costly, large-scale gold-standard Q\&A curation or rely on secondary model verification, neither of which offers deterministic assurance. This paper introduces a framework that organizes proprietary knowledge and model-generated content into interactive visual knowledge graphs. The objective is to provide end users with a clear, intuitive view of potential hallucination zones by linking model assertions to underlying sources of truth and indicating confidence levels. Through this visual interface, users can diagnose inconsistencies, identify weak reasoning chains, and supply corrective feedback. The resulting human-in-the-loop workflow creates a structured feedback loop that can enhance model reliability and continuously improve response quality.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
Hallucination Detection and Mitigation in Large Language Models [0.0]
Large Language Models (LLMs) and Large Reasoning Models (LRMs) offer transformative potential for high-stakes domains like finance and law.<n>Their tendency to hallucinate, generating factually incorrect or unsupported content, poses a critical reliability risk.<n>This paper introduces a comprehensive framework for hallucination management, built on a continuous improvement cycle driven by root cause awareness.
arXiv Detail & Related papers (2026-01-14T23:19:37Z)
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance [14.932352020762991]
We propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm.<n>It is a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality.<n>With only 465 self-refined training samples, RELFEX achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-25T12:06:23Z)
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection [71.8243083897721]
Vision-language models often hallucinate details, generating non-existent objects or inaccurate attributes that compromise output reliability.<n>We present a novel framework that leverages the model's self-consistency between long responses and short answers to generate preference pairs for training.
arXiv Detail & Related papers (2025-09-27T10:37:11Z)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models [24.363156120809546]
We propose KIE-HVQA, the first benchmark dedicated to evaluating OCR hallucination in degraded document understanding.<n>This dataset includes test samples spanning identity cards and invoices, with simulated real-world degradations for OCR reliability.<n>Experiments on Qwen2.5-VL demonstrate that our 7B- parameter model achieves a 22% absolute improvement in hallucination-free accuracy over GPT-4o.
arXiv Detail & Related papers (2025-06-25T06:44:07Z)
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs [20.438680406650967]
This paper addresses language model hallucination by integrating curated knowledge graph (KG) triples to anchor responses in empirical data.<n>We aim to generate both linguistically fluent responses and deeply rooted in factual accuracy and context relevance.
arXiv Detail & Related papers (2024-12-24T20:16:10Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction [2.9526207670430384]
We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data.
arXiv Detail & Related papers (2024-04-24T15:27:25Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)
A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. We propose a framework that we call controllable grounded response generation (CGRG) We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.