Related papers: Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error

Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error

URL: http://arxiv.org/abs/2512.16750v1
Date: Thu, 18 Dec 2025 16:45:29 GMT
Title: Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error
Authors: Claudia Vale Oliveira, Nelson Zagalo, Filipe Silva, Anabela Brandao, Syeda Faryal Hussain Khurrum, Joaquim Santos,
Abstract summary: This study examines how different forms of epistemic failure emerge, are masked, and are tolerated in human AI interaction.<n>Evaluators frequently conflated criteria such as correctness, relevance, bias, groundedness, and consistency, indicating that human judgment collapses analytical distinctions into intuitives shaped by form and fluency.<n>The study provides implications for LLM assessment, digital literacy, and the design of trustworthy human AI communication.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are increasingly used as epistemic partners in everyday reasoning, yet their errors remain predominantly analyzed through predictive metrics rather than through their interpretive effects on human judgment. This study examines how different forms of epistemic failure emerge, are masked, and are tolerated in human AI interaction, where failure is understood as a relational breakdown shaped by model-generated plausibility and human interpretive judgment. We conducted a three round, multi LLM evaluation using interdisciplinary tasks and progressively differentiated assessment frameworks to observe how evaluators interpret model responses across linguistic, epistemic, and credibility dimensions. Our findings show that LLM errors shift from predictive to hermeneutic forms, where linguistic fluency, structural coherence, and superficially plausible citations conceal deeper distortions of meaning. Evaluators frequently conflated criteria such as correctness, relevance, bias, groundedness, and consistency, indicating that human judgment collapses analytical distinctions into intuitive heuristics shaped by form and fluency. Across rounds, we observed a systematic verification burden and cognitive drift. As tasks became denser, evaluators increasingly relied on surface cues, allowing erroneous yet well formed answers to pass as credible. These results suggest that error is not solely a property of model behavior but a co-constructed outcome of generative plausibility and human interpretive shortcuts. Understanding AI epistemic failure therefore requires reframing evaluation as a relational interpretive process, where the boundary between system failure and human miscalibration becomes porous. The study provides implications for LLM assessment, digital literacy, and the design of trustworthy human AI communication.

Related papers

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning [51.56484100374058]
We argue that limitations reflect structural properties of the supervision channel rather than model scale or optimization.<n>We develop a unified theory showing that whenever the human supervision channel is not sufficient for a latent evaluation target, it acts as an information-reducing channel.
arXiv Detail & Related papers (2026-02-26T19:11:32Z)
Epistemological Fault Lines Between Human and Artificial Intelligence [0.688204255655161]
We show that the apparent alignment between human and machine outputs conceals a deeper structural mismatch in how judgments are produced.<n>We argue that LLMs are not agents but pattern-completion systems, formally describable as walks on high-dimensional graphs of linguistic transitions.
arXiv Detail & Related papers (2025-12-22T15:20:21Z)
Accuracy Does Not Guarantee Human-Likeness in Monocular Depth Estimators [2.466518228012258]
Deep neural networks (DNNs) have achieved superhuman accuracy on physical-based benchmarks.<n>Monocular depth estimation is a fundamental capability for real-world applications such as autonomous driving and robotics.<n>Research in object recognition has revealed a complex trade-off between model accuracy and human-like behavior.
arXiv Detail & Related papers (2025-12-09T01:42:00Z)
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs [60.15472325639723]
Personality traits have long been studied as predictors of human behavior.<n>Recent advances in Large Language Models (LLMs) suggest similar patterns may emerge in artificial systems.
arXiv Detail & Related papers (2025-09-03T21:27:10Z)
A comprehensive taxonomy of hallucinations in Large Language Models [0.0]
Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination remains a critical challenge.<n>This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework.<n>It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences.
arXiv Detail & Related papers (2025-08-03T14:37:16Z)
Using AI to replicate human experimental results: a motion study [0.11838866556981258]
This paper explores the potential of large language models (LLMs) as reliable analytical tools in linguistic research.<n>It focuses on the emergence of affective meanings in temporal expressions involving manner-of-motion verbs.
arXiv Detail & Related papers (2025-07-14T14:47:01Z)
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning [0.0]
We study logical reasoning in language models by asking whether their errors follow established human fallacy.<n>For each response, we judge logical reasoning and correctness when it matches an ETRpredicted fallacy.
arXiv Detail & Related papers (2025-06-10T17:04:33Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Six Fallacies in Substituting Large Language Models for Human Participants [0.0]
Can AI systems like large language models (LLMs) replace human participants in behavioral and psychological research?<n>Here I critically evaluate the "replacement" perspective and identify six interpretive fallacies that undermine its validity.<n>Each fallacy represents a potential misunderstanding about what LLMs are and what they can tell us about human cognition.
arXiv Detail & Related papers (2024-02-06T23:28:23Z)
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally. We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.