Related papers: Epistemic Constitutionalism Or: how to avoid coherence bias

Epistemic Constitutionalism Or: how to avoid coherence bias

URL: http://arxiv.org/abs/2601.14295v1
Date: Fri, 16 Jan 2026 07:36:30 GMT
Title: Epistemic Constitutionalism Or: how to avoid coherence bias
Authors: Michele Loi,
Abstract summary: This paper argues for an explicit, contestable meta-norms that regulate how systems form and express beliefs.<n>I show that frontier models enforce identity-stance coherence, penalizing arguments attributed to sources whose expected ideological position conflicts with the argument's content.<n>I distinguish two constitutional approaches: the Platonic, which mandates formal correctness and default source-independence from a privileged standpoint, and the Liberal, which refuses such privilege.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues for an epistemic constitution for AI: explicit, contestable meta-norms that regulate how systems form and express beliefs. Source attribution bias provides the motivating case: I show that frontier models enforce identity-stance coherence, penalizing arguments attributed to sources whose expected ideological position conflicts with the argument's content. When models detect systematic testing, these effects collapse, revealing that systems treat source-sensitivity as bias to suppress rather than as a capacity to execute well. I distinguish two constitutional approaches: the Platonic, which mandates formal correctness and default source-independence from a privileged standpoint, and the Liberal, which refuses such privilege, specifying procedural norms that protect conditions for collective inquiry while allowing principled source-attending grounded in epistemic vigilance. I argue for the Liberal approach, sketch a constitutional core of eight principles and four orientations, and propose that AI epistemic governance requires the same explicit, contestable structure we now expect for AI ethics.

Related papers

Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values [0.2511917198008257]
Grounded Constitutional AI (GCAI) is a unified framework for generating constitutions of principles.<n>We show that a constitution generated by GCAI is preferred by humans over one generated through ICAI both personally, and for widespread use in governing AI behavior.
arXiv Detail & Related papers (2026-01-26T18:27:00Z)
The Inconsistency Critique: Epistemic Practices and AI Testimony About Inner States [0.0]
The question of whether AI systems have morally relevant interests depends in part on how we evaluate AI testimony about inner states.<n>This paper develops what I call the inconsistency critique: independent of whether skepticism about AI testimony is ultimately justified.
arXiv Detail & Related papers (2025-12-22T18:54:07Z)
The MEVIR Framework: A Virtue-Informed Moral-Epistemic Model of Human Trust Decisions [0.0]
This report introduces the Moral-Epistemic VIRtue informed (MEVIR) framework.<n>Central to the framework are ontological concepts - Truth Bearers, Truth Makers, and Ontological Unpacking.<n>Report analyzes how propaganda, psychological operations, and echo chambers exploit the MEVIR process.
arXiv Detail & Related papers (2025-12-02T01:11:35Z)
Exploring Syntropic Frameworks in AI Alignment: A Philosophical Investigation [0.0]
I argue that AI alignment should be reconceived as architecting syntropic, reasons-responsive agents through process-based, multi-agent, developmental mechanisms.<n>I articulate the specification trap'' argument demonstrating why content-based value specification appears structurally unstable.<n>I propose syntropy as an information-theoretic framework for understanding multi-agent alignment dynamics.
arXiv Detail & Related papers (2025-11-19T23:31:29Z)
Epistemic Deference to AI [0.01692139688032578]
I argue that some AI systems are Artificial Epistemic Authorities (AEAs)<n>AEAs should function as contributory reasons rather than outright replacements for a user's independent epistemic considerations.<n>While demanding in practice, this account offers a principled way to determine when AI deference is justified.
arXiv Detail & Related papers (2025-10-23T22:55:51Z)
The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims [0.7233897166339268]
This paper introduces the Epistemic Suite, a diagnostic methodology for surfacing the conditions under which AI outputs are produced and received.<n>Rather than determining truth or falsity, the Suite operates through twenty diagnostic lenses to reveal patterns such as confidence laundering, narrative compression, displaced authority, and temporal drift.
arXiv Detail & Related papers (2025-09-20T00:29:38Z)
Cognitive Castes: Artificial Intelligence, Epistemic Stratification, and the Dissolution of Democratic Discourse [0.0]
The argument traces how contemporary AI systems amplify the reasoning capacity of individuals equipped with abstraction, symbolic logic, and adversarial interrogation.<n>The proposed response is not technocratic regulation, nor universal access, but the reconstruction of rational autonomy as a civic mandate.
arXiv Detail & Related papers (2025-07-16T08:46:45Z)
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions [50.40122190627256]
We introduce POATE, a novel jailbreak technique that harnesses contrastive reasoning to provoke unethical responses.<n>PoATE crafts semantically opposing intents and integrates them with adversarial templates, steering models toward harmful outputs with remarkable subtlety.<n>To counter this, we propose Intent-Aware CoT and Reverse Thinking CoT, which decompose queries to detect malicious intent and reason in reverse to evaluate and reject harmful responses.
arXiv Detail & Related papers (2025-01-03T15:40:03Z)
Deliberative Alignment: Reasoning Enables Safer Language Models [64.60765108418062]
We introduce Deliberative Alignment, a new paradigm that teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering.<n>We used this approach to align OpenAI's o-series models, and achieved highly precise adherence to OpenAI's safety policies, without requiring human-written chain-of-thoughts or answers.
arXiv Detail & Related papers (2024-12-20T21:00:11Z)
Are language models rational? The case of coherence norms and belief revision [63.78798769882708]
We consider logical coherence norms as well as coherence norms tied to the strength of belief in language models. We argue that rational norms tied to coherence do apply to some language models, but not to others.
arXiv Detail & Related papers (2024-06-05T16:36:21Z)
A Semantic Approach to Decidability in Epistemic Planning (Extended Version) [72.77805489645604]
We use a novel semantic approach to achieve decidability. Specifically, we augment the logic of knowledge S5$_n$ and with an interaction axiom called (knowledge) commutativity. We prove that our framework admits a finitary non-fixpoint characterization of common knowledge, which is of independent interest.
arXiv Detail & Related papers (2023-07-28T11:26:26Z)
Measuring Association Between Labels and Free-Text Rationales [60.58672852655487]
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance. We demonstrate that pipelines, existing models for faithful extractive rationalization on information-extraction style tasks, do not extend as reliably to "reasoning" tasks requiring free-text rationales. We turn to models that jointly predict and rationalize, a class of widely used high-performance models for free-text rationalization whose faithfulness is not yet established.
arXiv Detail & Related papers (2020-10-24T03:40:56Z)
Aligning Faithful Interpretations with their Social Attribution [58.13152510843004]
We find that the requirement of model interpretations to be faithful is vague and incomplete. We identify that the problem is a misalignment between the causal chain of decisions (causal attribution) and the attribution of human behavior to the interpretation (social attribution)
arXiv Detail & Related papers (2020-06-01T16:45:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.