Related papers: A Unified Representation Underlying the Judgment of Large Language Models

A Unified Representation Underlying the Judgment of Large Language Models

URL: http://arxiv.org/abs/2510.27328v2
Date: Tue, 04 Nov 2025 12:25:30 GMT
Title: A Unified Representation Underlying the Judgment of Large Language Models
Authors: Yi-Long Lu, Jiajun Song, Wei Wang,
Abstract summary: A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource.<n>We show that evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA)<n>VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy.
Score: 6.674085049223262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.

Related papers

The Observer-Situation Lattice: A Unified Formal Basis for Perspective-Aware Cognition [2.28438857884398]
We introduce the Observer-Situation Lattice (OSL), a unified mathematical structure that provides a single, coherent semantic space for perspective-aware cognition.<n>OSL is a finite complete lattice where each element represents a unique observer-situation pair, allowing for a principled and scalable approach to belief management.<n>We present two key algorithms that operate on this lattice: (i) Relativized Belief propagation, an incremental update algorithm that efficiently propagates new information, and (ii) Minimal Contradiction Decomposition, a graph-based procedure that identifies and isolates contradiction components.
arXiv Detail & Related papers (2026-03-02T03:15:36Z)
Quantifying Model Uniqueness in Heterogeneous AI Ecosystems [1.1162481475388237]
We introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design.<n>By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER)<n>These results move trustworthy AI beyond explaining single models.
arXiv Detail & Related papers (2026-01-30T13:41:53Z)
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents [0.0]
We introduce textbfProject Ariadne, a novel XAI framework to audit the causal integrity of agentic reasoning.<n>Unlike existing interpretability methods that rely on surface-level textual similarity, Project Ariadne performs textbfhard interventions ($do$-calculus) on intermediate reasoning nodes.<n>Our empirical evaluation of state-of-the-art models reveals a persistent textitFaithfulness Gap.
arXiv Detail & Related papers (2026-01-05T18:05:29Z)
A Monad-Based Clause Architecture for Artificial Age Score (AAS) in Large Language Models [0.0]
This work develops an engineering-oriented, clause-based architecture that imposes law-like constraints on large language models.<n>Twenty selected monads from Leibniz's Monadology are grouped into six bundles.<n>Six minimal Python implementations are instantiated in numerical experiments acting on channel-level quantities.
arXiv Detail & Related papers (2025-12-03T12:48:40Z)
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z)
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model [29.40036398095681]
We define the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations.<n>We build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples.<n>Our experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations.
arXiv Detail & Related papers (2025-09-26T16:02:27Z)
On the Fundamental Impossibility of Hallucination Control in Large Language Models [0.0]
Impossibility Theorem: no LLM performing non-trivial knowledge aggregation can simultaneously achieve truthful knowledge representation, semantic information conservation, and revelation of relevant knowledge.<n>We prove this by modeling inference as an auction of ideas, where distributed components compete to influence responses using encoded knowledge.<n>We show that hallucination and imagination are mathematically identical, and both violate at least one of the four essential properties.
arXiv Detail & Related papers (2025-06-04T23:28:39Z)
The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning [2.0800882594868293]
Unified Cognitive Consciousness Theory (UCCT) casts them as vast unconscious pattern repositories.<n>UCCT formalizes this process as Bayesian competition between statistical priors learned in pre-training and context-driven target patterns.<n>We ground the theory in three principles: threshold crossing, modality, and density-distance predictive power.
arXiv Detail & Related papers (2025-06-02T18:12:43Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Disentangling Representations through Multi-task Learning [0.0]
We provide experimental and theoretical results guaranteeing the emergence of disentangled representations in agents that optimally solve classification tasks.<n>We experimentally validate these predictions in RNNs trained to multi-task, which learn disentangled representations in the form of continuous attractors.<n>We find that transformers are particularly suited for disentangling representations, which might explain their unique world understanding abilities.
arXiv Detail & Related papers (2024-07-15T21:32:58Z)
Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality. This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models. First, we show that neural causal models (NCMs) are expressive enough. Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z)
Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets. This contribution should be regarded as a systematic approach to represent structural causal models by credal networks. Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
CausalVAE: Structured Causal Disentanglement in Variational Autoencoder [52.139696854386976]
The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations. We propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent factors into causal endogenous ones. Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy.
arXiv Detail & Related papers (2020-04-18T20:09:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.