Critical Confabulation: Can LLMs Hallucinate for Social Good?
- URL: http://arxiv.org/abs/2511.07722v1
- Date: Wed, 12 Nov 2025 01:12:57 GMT
- Title: Critical Confabulation: Can LLMs Hallucinate for Social Good?
- Authors: Peiqi Sui, Eamon Duede, Hoyt Long, Richard Jean So,
- Abstract summary: We propose critical confabulation to fill-in-the-gap for omissions in archives due to social and political inequality.<n>We reconstruct divergent yet evidence-bound narratives for history's "hidden figures"<n>Our findings validate LLMs' foundational narrative understanding capabilities to perform critical confabulation.
- Score: 4.013184717814947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLMs hallucinate, yet some confabulations can have social affordances if carefully bounded. We propose critical confabulation (inspired by critical fabulation from literary and social theory), the use of LLM hallucinations to "fill-in-the-gap" for omissions in archives due to social and political inequality, and reconstruct divergent yet evidence-bound narratives for history's "hidden figures". We simulate these gaps with an open-ended narrative cloze task: asking LLMs to generate a masked event in a character-centric timeline sourced from a novel corpus of unpublished texts. We evaluate audited (for data contamination), fully-open models (the OLMo-2 family) and unaudited open-weight and proprietary baselines under a range of prompts designed to elicit controlled and useful hallucinations. Our findings validate LLMs' foundational narrative understanding capabilities to perform critical confabulation, and show how controlled and well-specified hallucinations can support LLM applications for knowledge production without collapsing speculation into a lack of historical accuracy and fidelity.
Related papers
- Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? [16.08138269588599]
We investigate the extent to which large language models (LLMs) can reliably separate incoherent and coherent stories.<n>LLMs generate responses to rating questions that fail to satisfactorily separate the coherent and incoherent narratives.
arXiv Detail & Related papers (2025-12-08T17:58:43Z) - DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises [28.72485319617863]
We propose a new prompting algorithm, named DecoPrompt, to mitigate hallucination.<n> DecoPrompt leverages LLMs to "decode" the false-premise prompts without really eliciting hallucination output from LLMs.<n>We perform experiments on two datasets, demonstrating that DecoPrompt can reduce hallucinations effectively on outputs from different LLMs.
arXiv Detail & Related papers (2024-11-12T00:48:01Z) - LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text.
LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z) - WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries [64.239202960816]
We introduce WildHallucinations, a benchmark that evaluates factuality.
It does so by prompting large language models to generate information about entities mined from user-chatbot conversations in the wild.
We evaluate 118,785 generations from 15 LLMs on 7,919 entities.
arXiv Detail & Related papers (2024-07-24T17:59:05Z) - Look Within, Why LLMs Hallucinate: A Causal Perspective [16.874588396996764]
Large language models (LLMs) are a milestone in generative artificial intelligence, achieving significant success in text comprehension and generation tasks.
LLMs suffer from severe hallucination problems, posing significant challenges to the practical applications of LLMs.
We propose a method to intervene in LLMs' self-attention layers and maintain their structures and sizes intact.
arXiv Detail & Related papers (2024-07-14T10:47:44Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Exploring and Evaluating Hallucinations in LLM-Powered Code Generation [14.438161741833687]
Large Language Models (LLMs) produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with factual knowledge.
Existing work mainly focuses on investing the hallucination in the domain of natural language generation.
We conduct a thematic analysis of the LLM-generated code to summarize and categorize the hallucinations present in it.
We propose HalluCode, a benchmark for evaluating the performance of code LLMs in recognizing hallucinations.
arXiv Detail & Related papers (2024-04-01T07:31:45Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples [17.012156573134067]
We show that nonsensical prompts composed of random tokens can elicit large language models to respond with hallucinations.
We formalize an automatic hallucination triggering method as the textithallucination attack in an adversarial way.
Our code is released on GitHub.
arXiv Detail & Related papers (2023-10-02T17:01:56Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models [124.90671698586249]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.<n>LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z) - HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge.
To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.