Related papers: HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

URL: http://arxiv.org/abs/2504.07069v1
Date: Wed, 09 Apr 2025 17:39:41 GMT
Title: HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
Authors: Bibek Paudel, Alexander Lyzhov, Preetam Joshi, Puneet Anand,
Abstract summary: This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings.<n>We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements.<n>Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge)
Score: 40.69033997154463
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to hallucination in enterprise applications, categorizing them into context-based, common knowledge, enterprise-specific, and innocuous statements. Our hallucination detection model HDM-2 validates LLM responses with respect to both context and generally known facts (common knowledge). It provides both hallucination scores and word-level annotations, enabling precise identification of problematic content. To evaluate it on context-based and common-knowledge hallucinations, we introduce a new dataset HDMBench. Experimental results demonstrate that HDM-2 out-performs existing approaches across RagTruth, TruthfulQA, and HDMBench datasets. This work addresses the specific challenges of enterprise deployment, including computational efficiency, domain specialization, and fine-grained error identification. Our evaluation dataset, model weights, and inference code are publicly available.

Related papers

KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge [1.845601051662407]
Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language.<n>Existing benchmarks are limited by static and narrow questions, leading to limited coverage and misleading evaluations.<n>We present KGHaluBench, a Knowledge Graph-based hallucination benchmark that assesses LLMs across the breadth and depth of their knowledge.
arXiv Detail & Related papers (2026-02-23T09:41:46Z)
HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification [0.9490124006642771]
HalluMatData is a benchmark dataset for evaluating hallucination detection methods.<n>HalluMatDetector is a multi-stage hallucination detection framework.<n>HalluMatDetector reduces hallucination verification rates by 30%.
arXiv Detail & Related papers (2025-12-26T22:16:12Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning [18.927164579769066]
Existing approaches primarily detect the presence of hallucinations but lack a nuanced understanding of their types and manifestations.<n>We introduce a comprehensive taxonomy that categorizes the common hallucinations in mathematical reasoning tasks into six types.<n>We then propose FG-PRM, an augmented model designed to detect and mitigate hallucinations in a fine-grained, step-level manner.
arXiv Detail & Related papers (2024-10-08T19:25:26Z)
Mitigating Entity-Level Hallucination in Large Language Models [11.872916697604278]
This paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD) as a novel method to detect and mitigate hallucinations in Large Language Models (LLMs) Experiment results show that DRAD demonstrates superior performance in both detecting and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-07-12T16:47:34Z)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks. They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z)
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models [11.138489774712163]
We propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH) Our method generates test cases and detects hallucinations across six different large language models spanning nine domains, revealing rates ranging from 24.7% to 59.8%.
arXiv Detail & Related papers (2024-05-01T17:24:42Z)
Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z)
DelucionQA: Detecting Hallucinations in Domain-specific Question Answering [22.23664008053246]
Hallucination is a well-known phenomenon in text generated by large language models (LLMs) We introduce a dataset, DelucionQA, that captures hallucinations made by retrieval-augmented LLMs for a domain-specific QA task. We propose a set of hallucination detection methods to serve as baselines for future works from the research community.
arXiv Detail & Related papers (2023-12-08T17:41:06Z)
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations. We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall. We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.