Related papers: Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality

Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality

URL: http://arxiv.org/abs/2512.11000v1
Date: Wed, 10 Dec 2025 19:00:34 GMT
Title: Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality
Authors: Francesco Lässig,
Abstract summary: We show that relational structures in network connectivity can unambiguously encode representational content.<n>We also show that spatial position information of input neurons can be decoded from network connectivity with R2 up to 0.844.
Score: 0.1122155793116341
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Representations pervade our daily experience, from letters representing sounds to bit strings encoding digital files. While such representations require externally defined decoders to convey meaning, conscious experience appears fundamentally different: a neural state corresponding to perceiving a red square cannot alternatively encode the experience of a green square. This intrinsic property of consciousness suggests that conscious representations must be unambiguous in a way that conventional representations are not. We formalize this intuition using information theory, defining representational ambiguity as the conditional entropy H(I|R) over possible interpretations I given a representation R. Through experiments on neural networks trained to classify MNIST digits, we demonstrate that relational structures in network connectivity can unambiguously encode representational content. Using both learned decoders and direct geometric matching, we achieve perfect (100%) accuracy for dropout-trained networks and 38% for standard backpropagation in identifying output neuron class identity, despite identical task performance, demonstrating that representational ambiguity can arise orthogonally to behavioral accuracy. We further show that spatial position information of input neurons can be decoded from network connectivity with R2 up to 0.844. These results provide a quantitative method for measuring representational ambiguity in neural systems and demonstrate that neural networks can exhibit the low-ambiguity representations posited as necessary (though not sufficient) by theoretical accounts of consciousness.

Related papers

Semantic representations emerge in biologically inspired ensembles of cross-supervising neural networks [1.5346678870160888]
We present a model of representation learning by ensembles of neural networks.<n>Each network learns to encode stimuli into an abstract representation space by cross-supervising interactions with other networks.<n>We find that performance is optimal for small receptive fields, and that sparse connectivity between networks is nearly as accurate as all-to-all interactions.
arXiv Detail & Related papers (2025-10-16T09:30:22Z)
Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
From superposition to sparse codes: interpretable representations in neural networks [3.6738925004882685]
Recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations.<n>We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations.<n>Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.
arXiv Detail & Related papers (2025-03-03T18:49:59Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients [0.0]
It has been demonstrated that artificial neural networks like autoencoders or Siamese networks encode meaningful concepts in their latent spaces.<n>We introduce a framework for finding closed-form interpretations of neurons in latent spaces of artificial neural networks.
arXiv Detail & Related papers (2024-09-09T03:26:07Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Visualizing Neural Network Imagination [2.1749194587826026]
In certain situations, neural networks will represent environment states in their hidden activations. Our goal is to visualize what environment states the networks are representing. We define a quantitative interpretability metric and use it to demonstrate that hidden states can be highly interpretable.
arXiv Detail & Related papers (2024-05-10T11:43:35Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
On Tractable Representations of Binary Neural Networks [23.50970665150779]
We consider the compilation of a binary neural network's decision function into tractable representations such as Ordered Binary Decision Diagrams (OBDDs) and Sentential Decision Diagrams (SDDs) In experiments, we show that it is feasible to obtain compact representations of neural networks as SDDs.
arXiv Detail & Related papers (2020-04-05T03:21:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.