Related papers: Large Language Models Do NOT Really Know What They Don't Know

Large Language Models Do NOT Really Know What They Don't Know

URL: http://arxiv.org/abs/2510.09033v1
Date: Fri, 10 Oct 2025 06:09:04 GMT
Title: Large Language Models Do NOT Really Know What They Don't Know
Authors: Chi Seng Cheang, Hou Pong Chan, Wenxuan Zhang, Yang Deng,
Abstract summary: Recent work suggests that large language models (LLMs) encode factuality signals in their internal representations.<n>LLMs can also produce factual errors by relying on shortcuts or spurious associations.
Score: 37.641827402866845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work suggests that large language models (LLMs) encode factuality signals in their internal representations, such as hidden states, attention weights, or token probabilities, implying that LLMs may "know what they don't know". However, LLMs can also produce factual errors by relying on shortcuts or spurious associations. These error are driven by the same training objective that encourage correct predictions, raising the question of whether internal computations can reliably distinguish between factual and hallucinated outputs. In this work, we conduct a mechanistic analysis of how LLMs internally process factual queries by comparing two types of hallucinations based on their reliance on subject information. We find that when hallucinations are associated with subject knowledge, LLMs employ the same internal recall process as for correct responses, leading to overlapping and indistinguishable hidden-state geometries. In contrast, hallucinations detached from subject knowledge produce distinct, clustered representations that make them detectable. These findings reveal a fundamental limitation: LLMs do not encode truthfulness in their internal states but only patterns of knowledge recall, demonstrating that "LLMs don't really know what they don't know".

Related papers

Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? [16.08138269588599]
We investigate the extent to which large language models (LLMs) can reliably separate incoherent and coherent stories.<n>LLMs generate responses to rating questions that fail to satisfactorily separate the coherent and incoherent narratives.
arXiv Detail & Related papers (2025-12-08T17:58:43Z)
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations [46.351064535592336]
Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures.<n>Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs.<n>We show that the internal representations of LLMs encode much more information about truthfulness than previously recognized.
arXiv Detail & Related papers (2024-10-03T17:31:31Z)
LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z)
Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals [53.273592543786705]
Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. We propose CoKE, which first probes LLMs' knowledge boundary via internal confidence given a set of questions, and then leverages the probing results to elicit the expression of the knowledge boundary.
arXiv Detail & Related papers (2024-06-16T10:07:20Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.<n>If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.<n>To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)
Do Large Language Models Know about Facts? [60.501902866946]
Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. We aim to evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio. Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages.
arXiv Detail & Related papers (2023-10-08T14:26:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.