Related papers: A mathematical theory for understanding when abstract representations emerge in neural networks

A mathematical theory for understanding when abstract representations emerge in neural networks

URL: http://arxiv.org/abs/2510.09816v1
Date: Fri, 10 Oct 2025 19:30:57 GMT
Title: A mathematical theory for understanding when abstract representations emerge in neural networks
Authors: Bin Wang, W. Jeffrey Johnston, Stefano Fusi,
Abstract summary: We show that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks.<n>These representations reflect the structure of the desired outputs or the semantics of the input stimuli.
Score: 4.415536082342714
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent experiments reveal that task-relevant variables are often encoded in approximately orthogonal subspaces of the neural activity space. These disentangled low-dimensional representations are observed in multiple brain areas and across different species, and are typically the result of a process of abstraction that supports simple forms of out-of-distribution generalization. The mechanisms by which such geometries emerge remain poorly understood, and the mechanisms that have been investigated are typically unsupervised (e.g., based on variational auto-encoders). Here, we show mathematically that abstract representations of latent variables are guaranteed to appear in the last hidden layer of feedforward nonlinear networks when they are trained on tasks that depend directly on these latent variables. These abstract representations reflect the structure of the desired outputs or the semantics of the input stimuli. To investigate the neural representations that emerge in these networks, we develop an analytical framework that maps the optimization over the network weights into a mean-field problem over the distribution of neural preactivations. Applying this framework to a finite-width ReLU network, we find that its hidden layer exhibits an abstract representation at all global minima of the task objective. We further extend these analyses to two broad families of activation functions and deep feedforward architectures, demonstrating that abstract representations naturally arise in all these scenarios. Together, these results provide an explanation for the widely observed abstract representations in both the brain and artificial neural networks, as well as a mathematically tractable toolkit for understanding the emergence of different kinds of representations in task-optimized, feature-learning network models.

Related papers

Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Local vs distributed representations: What is the right basis for interpretability? [19.50614357801837]
We show that features obtained from sparse distributed representations are easier to interpret by human observers. Our results highlight that distributed representations constitute a superior basis for interpretability.
arXiv Detail & Related papers (2024-11-06T15:34:57Z)
Identifying Sub-networks in Neural Networks via Functionally Similar Representations [41.028797971427124]
We take a step toward automating the understanding of the network by investigating the existence of distinct sub-networks.<n>Specifically, we explore a novel automated and task-agnostic approach based on the notion of functionally similar representations within neural networks.<n>We show the proposed approach offers meaningful insights into the behavior of neural networks with minimal human and computational cost.
arXiv Detail & Related papers (2024-10-21T20:19:00Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains. We show that such abstractions are learnable in practical settings through Neural Causal Models. Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z)
Emergence and Function of Abstract Representations in Self-Supervised Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes. We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset. Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z)
Image segmentation with traveling waves in an exactly solvable recurrent neural network [71.74150501418039]
We show that a recurrent neural network can effectively divide an image into groups according to a scene's structural characteristics. We present a precise description of the mechanism underlying object segmentation in this network. We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images.
arXiv Detail & Related papers (2023-11-28T16:46:44Z)
The semantic landscape paradigm for neural networks [0.0]
We introduce the semantic landscape paradigm, a conceptual and mathematical framework that describes the training dynamics of neural networks. Specifically, we show that grokking and emergence with scale are associated with percolation phenomena, and neural scaling laws are explainable in terms of the statistics of random walks on graphs.
arXiv Detail & Related papers (2023-07-18T18:48:54Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Generalized Shape Metrics on Neural Representations [26.78835065137714]
We provide a family of metric spaces that quantify representational dissimilarity. We modify existing representational similarity measures based on canonical correlation analysis to satisfy the triangle inequality. We identify relationships between neural representations that are interpretable in terms of anatomical features and model performance.
arXiv Detail & Related papers (2021-10-27T19:48:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.