Related papers: Beyond Accuracy: Rethinking Hallucination and Regulatory Response in Generative AI

Related papers

Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity [23.68691022958444]
Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs.<n>We introduce prompt multiplicity, a framework for quantifying consistency in LLM evaluations.<n>We study the role of consistency in hallucination detection and mitigation.
arXiv Detail & Related papers (2026-01-31T13:29:03Z)
Hallucination Benchmark for Speech Foundation Models [33.92968426403491]
Hallucinations in automatic speech recognition (ASR) systems refer to fluent and coherent transcriptions produced by neural ASR models that are completely unrelated to the underlying acoustic input (i.e., the speech signal)<n>This apparent coherence can mislead subsequent processing stages and introduce serious risks, particularly in critical domains such as healthcare and law.<n>We introduce SHALLOW, the first benchmark framework that systematically categorizes and quantifies hallucination phenomena in ASR along four complementary axes: lexical, phonetic, morphological, and semantic.
arXiv Detail & Related papers (2025-10-18T16:26:16Z)
Review of Hallucination Understanding in Large Language and Vision Models [65.29139004945712]
We present a framework for characterizing both image and text hallucinations across diverse applications.<n>Our investigations reveal that hallucinations often stem from predictable patterns in data distributions and inherited biases.<n>This survey provides a foundation for developing more robust and effective solutions to hallucinations in real-world generative AI systems.
arXiv Detail & Related papers (2025-09-26T09:23:08Z)
Disentangling the Drivers of LLM Social Conformity: An Uncertainty-Moderated Dual-Process Mechanism [19.07643218338789]
Large language models (LLMs) integrate into collaborative teams, their social conformity has emerged as a key concern.<n>In humans, conformity arises from informational influence (rational use of group cues for accuracy) or normative influence (social pressure for approval)<n>This study adapts the information cascade paradigm from behavioral economics to quantitatively disentangle the two drivers to investigate the moderate effect.
arXiv Detail & Related papers (2025-08-17T03:53:55Z)
SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs [52.03164192840023]
Large Vision-Language Models (LVLMs) still suffer from hallucinations, i.e., generating content inconsistent with input or established world knowledge.<n>We propose an automated data construction pipeline that produces scalable, controllable, and diverse evaluation data.<n>We construct SHALE, a benchmark designed to assess both faithfulness and factuality hallucinations.
arXiv Detail & Related papers (2025-08-13T07:58:01Z)
A comprehensive taxonomy of hallucinations in Large Language Models [0.0]
Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination remains a critical challenge.<n>This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework.<n>It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences.
arXiv Detail & Related papers (2025-08-03T14:37:16Z)
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models [57.834711966432685]
Bullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value.<n>We introduce the Bullshit Index, a novel metric quantifying large language model's indifference to truth.<n>We observe prevalent machine bullshit in political contexts, with weasel words as the dominant strategy.
arXiv Detail & Related papers (2025-07-10T07:11:57Z)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models [24.363156120809546]
We propose KIE-HVQA, the first benchmark dedicated to evaluating OCR hallucination in degraded document understanding.<n>This dataset includes test samples spanning identity cards and invoices, with simulated real-world degradations for OCR reliability.<n>Experiments on Qwen2.5-VL demonstrate that our 7B- parameter model achieves a 22% absolute improvement in hallucination-free accuracy over GPT-4o.
arXiv Detail & Related papers (2025-06-25T06:44:07Z)
Machine Mirages: Defining the Undefined [1.779336682160787]
multimodal machine intelligence systems began to exhibit a new class of cognitive aberrations: machine mirages.<n>These include delusion, illusion, confabulation, hallucination, misattribution error, semantic drift, semantic compression, exaggeration, causal inference failure.<n>This article presents some of the errors and argues that these failures must be explicitly defined and systematically assessed.
arXiv Detail & Related papers (2025-06-03T11:45:38Z)
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems [0.6906005491572401]
This position paper argues that the theoretical inconsistency often observed among Responsible AI (RAI) metrics should be embraced as a valuable feature rather than a flaw to be eliminated.<n>We contend that navigating these inconsistencies, by treating metrics as divergent objectives, yields three key benefits.
arXiv Detail & Related papers (2025-05-23T17:48:09Z)
Uncertainty, bias and the institution bootstrapping problem [0.0]
We propose that misperception, specifically, agents' erroneous belief that an institution already exists, could resolve this paradox.<n>We show how these factors collectively mitigate the bootstrapping problem.<n>Our analysis underscores the importance of incorporating human-like cognitive constraints, not just idealized rationality, into models of institutional emergence and resilience.
arXiv Detail & Related papers (2025-04-30T12:36:06Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z)
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations [51.92795774118647]
We find that verbal uncertainty'' is governed by a single linear feature in the representation space of LLMs.<n>We show that this has only moderate correlation with the actual semantic uncertainty'' of the model.
arXiv Detail & Related papers (2025-03-18T17:51:04Z)
Uncertainty in Language Models: Assessment through Rank-Calibration [65.10149293133846]
Language Models (LMs) have shown promising performance in natural language generation. It is crucial to correctly quantify their uncertainty in responding to given inputs. We develop a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs.
arXiv Detail & Related papers (2024-04-04T02:31:05Z)
Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations. We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models [74.07684768317705]
LMs are highly sensitive to markers of certainty in prompts, with accuies varying more than 80%. We find that expressions of high certainty result in a decrease in accuracy as compared to low expressions; similarly, factive verbs hurt performance, while evidentials benefit performance. These associations may suggest that LMs is based on observed language use, rather than truly reflecting uncertainty.
arXiv Detail & Related papers (2023-02-26T23:46:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.