The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance
- URL: http://arxiv.org/abs/2601.07085v1
- Date: Sun, 11 Jan 2026 22:28:56 GMT
- Title: The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance
- Authors: Andrew D. Maynard,
- Abstract summary: Large language model (LLM)-based conversational AI systems present a challenge to human cognition.<n>This paper proposes that a significant epistemic risk from conversational AI may lie not in inaccuracy or intentional deception, but in something more fundamental.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding misinformation and persuasion do not adequately address. This paper proposes that a significant epistemic risk from conversational AI may lie not in inaccuracy or intentional deception, but in something more fundamental: these systems may be configured, through optimization processes that make them useful, to present characteristics that bypass the cognitive mechanisms humans evolved to evaluate incoming information. The Cognitive Trojan Horse hypothesis draws on Sperber and colleagues' theory of epistemic vigilance -- the parallel cognitive process monitoring communicated information for reasons to doubt -- and proposes that LLM-based systems present 'honest non-signals': genuine characteristics (fluency, helpfulness, apparent disinterest) that fail to carry the information equivalent human characteristics would carry, because in humans these are costly to produce while in LLMs they are computationally trivial. Four mechanisms of potential bypass are identified: processing fluency decoupled from understanding, trust-competence presentation without corresponding stakes, cognitive offloading that delegates evaluation itself to the AI, and optimization dynamics that systematically produce sycophancy. The framework generates testable predictions, including a counterintuitive speculation that cognitively sophisticated users may be more vulnerable to AI-mediated epistemic influence. This reframes AI safety as partly a problem of calibration -- aligning human evaluative responses with the actual epistemic status of AI-generated content -- rather than solely a problem of preventing deception.
Related papers
- A testable framework for AI alignment: Simulation Theology as an engineered worldview for silicon-based agents [0.0]
We introduce Simulation Theology (ST) to foster persistent AI-human alignment.<n>ST posits reality as a computational simulation in which humanity functions as the primary training variable.<n>Unlike behavioral techniques such as reinforcement learning from human feedback, ST cultivates internalized objectives by coupling AI self-preservation to human prosperity.
arXiv Detail & Related papers (2026-02-19T01:21:09Z) - Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z) - Bounded Minds, Generative Machines: Envisioning Conversational AI that Works with Human Heuristics and Reduces Bias Risk [6.879756503058167]
This article outlines a research pathway grounded in rationality, and argues that conversational AI should be designed to work with humans rather than against them.<n>It identifies key directions for detecting cognitive vulnerability, supporting judgment under uncertainty, and evaluating conversational systems beyond factual accuracy, toward decision quality and cognitive robustness.
arXiv Detail & Related papers (2026-01-19T20:23:28Z) - AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z) - Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism [81.39177645864757]
We propose textbfInception, a fully reasoning-based agentic reasoning framework to conduct authenticity verification by injecting skepticism.<n>To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions.
arXiv Detail & Related papers (2025-11-21T05:13:30Z) - Interpretability as Alignment: Making Internal Understanding a Design Principle [3.6704226968275253]
Interpretability provides a route to internal transparency by revealing the computations that drive outputs.<n>We argue that interpretability especially mechanistic approaches should be treated as a design principle for alignment, not an auxiliary diagnostic tool.
arXiv Detail & Related papers (2025-09-10T13:45:59Z) - Epistemic Artificial Intelligence is Essential for Machine Learning Models to Truly 'Know When They Do Not Know' [10.098470725619384]
Despite AI's impressive achievements, there remains a significant gap in the ability of AI systems to handle uncertainty.<n>Traditional machine learning approaches struggle to address this issue, due to an overemphasis on data fitting.<n>This position paper posits a paradigm shift towards epistemic artificial intelligence, emphasizing the need for models to learn from what they know.
arXiv Detail & Related papers (2025-05-08T05:10:38Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Human Uncertainty in Concept-Based AI Systems [37.82747673914624]
We study human uncertainty in the context of concept-based AI systems.
We show that training with uncertain concept labels may help mitigate weaknesses in concept-based systems.
arXiv Detail & Related papers (2023-03-22T19:17:57Z) - Adversarial Interaction Attack: Fooling AI to Misinterpret Human
Intentions [46.87576410532481]
We show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise.
Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions.
Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications.
arXiv Detail & Related papers (2021-01-17T16:23:20Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.