Related papers: Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces

Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces

URL: http://arxiv.org/abs/2509.03738v2
Date: Thu, 23 Oct 2025 01:32:48 GMT
Title: Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces
Authors: Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar,
Abstract summary: We introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO)<n>We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators.<n>We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.
Score: 75.45093712182624
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We frame the problem of unifying representations in neural models as one of sparse model recovery and introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO). While the Platonic Representation Hypothesis suggests that neural networks converge to similar representations across architectures, the representational properties of neural operators remain underexplored despite their growing importance in scientific computing. We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators. We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.

Related papers

Graph neural networks uncover structure and functions underlying the activity of simulated neural assemblies [0.0]
Graph neural networks trained to predict observable dynamics can be used to decompose the temporal activity of complex heterogeneous systems into simple, interpretable representations.<n>We apply this framework to simulated neural assemblies with thousands of neurons and demonstrate that it can jointly reveal the connectivity matrix, the neuron types, the signaling functions, and in some cases hidden external stimuli.
arXiv Detail & Related papers (2026-02-11T01:59:27Z)
Neuronal Group Communication for Efficient Neural representation [85.36421257648294]
This paper addresses the question of how to build large neural systems that learn efficient, modular, and interpretable representations.<n>We propose Neuronal Group Communication (NGC), a theory-driven framework that reimagines a neural network as a dynamical system of interacting neuronal groups.<n>NGC treats weights as transient interactions between embedding-like neuronal states, with neural computation unfolding through iterative communication among groups of neurons.
arXiv Detail & Related papers (2025-10-19T14:23:35Z)
Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, quantification, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z)
Formation of Representations in Neural Networks [8.79431718760617]
How complex, structured, and transferable representations emerge in modern neural networks has remained a mystery.<n>We propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations.<n>We show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH)
arXiv Detail & Related papers (2024-10-03T21:31:01Z)
The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns [3.9848584845601014]
We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets"<n>We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise.
arXiv Detail & Related papers (2024-07-08T06:22:10Z)
Spiking representation learning for associative memories [0.0]
We introduce a novel artificial spiking neural network (SNN) that performs unsupervised representation learning and associative memory operations.<n>The architecture of our model derives from the neocortical columnar organization and combines feedforward projections for learning hidden representations and recurrent projections for forming associative memories.
arXiv Detail & Related papers (2024-06-05T08:30:11Z)
Automated Natural Language Explanation of Deep Visual Neurons with Large Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models. Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z)
INO: Invariant Neural Operators for Learning Complex Physical Systems with Momentum Conservation [8.218875461185016]
We introduce a novel integral neural operator architecture, to learn physical models with fundamental conservation laws automatically guaranteed. As applications, we demonstrate the expressivity and efficacy of our model in learning complex material behaviors from both synthetic and experimental datasets.
arXiv Detail & Related papers (2022-12-29T16:40:41Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.