Can LLMs Detect Their Own Hallucinations?
- URL: http://arxiv.org/abs/2511.11087v1
- Date: Fri, 14 Nov 2025 09:03:09 GMT
- Title: Can LLMs Detect Their Own Hallucinations?
- Authors: Sora Kadotani, Kosuke Nishida, Kyosuke Nishida,
- Abstract summary: Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts.<n>We formulate hallucination detection as a classification task of a sentence.<n>We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters.
- Score: 13.071420274309551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2\%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.
Related papers
- Can Hallucinations Help? Boosting LLMs for Drug Discovery [8.960425754918974]
Hallucinations in large language models (LLMs) are often viewed as undesirable.<n>We find that hallucinations significantly improve predictive accuracy for some models.<n>We categorize over 18,000 beneficial hallucinations, with structural misdescriptions emerging as the most impactful type.
arXiv Detail & Related papers (2025-01-23T16:45:51Z) - MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models [26.464489158584463]
Large language models (LLMs) are starting to complement traditional information seeking mechanisms such as web search.<n>LLMs are prone to hallucinations, generating plausible yet factually incorrect or fabricated information.<n>This work conducts a pioneering study on hallucinations in LLM-generated responses to real-world healthcare queries from patients.
arXiv Detail & Related papers (2024-09-29T00:09:01Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - ANAH: Analytical Annotation of Hallucinations in Large Language Models [65.12177400764506]
We present $textbfANAH$, a dataset that offers $textbfAN$alytical $textbfA$nnotation of hallucinations in Large Language Models.
ANAH consists of 12k sentence-level annotations for 4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline.
Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs accumulate in the answer and use ANAH to train and evaluate hallucination annotators.
arXiv Detail & Related papers (2024-05-30T17:54:40Z) - Do LLMs Know about Hallucination? An Empirical Investigation of LLM's
Hidden States [19.343629282494774]
Large Language Models (LLMs) can make up answers that are not real, and this is known as hallucination.
This research aims to see if, how, and to what extent LLMs are aware of hallucination.
arXiv Detail & Related papers (2024-02-15T06:14:55Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [40.79317187623401]
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP)
LLMs are prone to hallucination, generating plausible yet nonfactual content.
This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval systems.
arXiv Detail & Related papers (2023-11-09T09:25:37Z) - HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge.
To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.