Emergent Symbol-like Number Variables in Artificial Neural Networks
- URL: http://arxiv.org/abs/2501.06141v2
- Date: Thu, 24 Apr 2025 02:48:10 GMT
- Title: Emergent Symbol-like Number Variables in Artificial Neural Networks
- Authors: Satchel Grant, Noah D. Goodman, James L. McClelland,
- Abstract summary: We aim to understand how well we can understand Neural Network (NN) solutions through interpretable algorithms.<n>We use GRUs, LSTMs, and Transformers trained using Next Token Prediction (NTP) on numeric tasks.<n>We show through multiple causal and theoretical methods that we can interpret NN's raw activity through the lens of simplified SAs.
- Score: 34.388552536773034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What types of numeric representations emerge in neural systems? What would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based counting tasks through a variety of lenses. We seek to understand how well we can understand NNs through the lens of interpretable Symbolic Algorithms (SAs), where SAs are defined by precise, abstract, mutable variables used to perform computations. We use GRUs, LSTMs, and Transformers trained using Next Token Prediction (NTP) on numeric tasks where the solutions to the tasks depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret NN's raw activity through the lens of simplified SAs when we frame the neural activity in terms of interpretable subspaces rather than individual neurons. Depending on the analysis, however, these interpretations can be graded, existing on a continuum, highlighting the philosophical question of what it means to "interpret" neural activity, and motivating us to introduce Alignment Functions to add flexibility to the existing Distributed Alignment Search (DAS) method. Through our specific analyses we show the importance of causal interventions for NN interpretability; we show that recurrent models develop graded, symbol-like number variables within their neural activity; we introduce a generalization of DAS to frame NN activity in terms of linear functions of interpretable variables; and we show that Transformers must use anti-Markovian solutions -- solutions that avoid using cumulative, Markovian hidden states -- in the absence of sufficient attention layers. We use our results to encourage interpreting NNs at the level of neural subspaces through the lens of SAs.
Related papers
- From Neurons to Neutrons: A Case Study in Interpretability [5.242869847419834]
We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions.
This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it.
arXiv Detail & Related papers (2024-05-27T17:59:35Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Identifying Interpretable Visual Features in Artificial and Biological
Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features.
Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features.
We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z) - Sparse Autoencoders Find Highly Interpretable Features in Language
Models [0.0]
Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally.
We use sparse autoencoders to reconstruct the internal activations of a language model.
Our method may serve as a foundation for future mechanistic interpretability work.
arXiv Detail & Related papers (2023-09-15T17:56:55Z) - Transferability of coVariance Neural Networks and Application to
Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks.
We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs)
VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Invariants for neural automata [0.0]
We develop a formal framework for the investigation of symmetries and invariants of neural automata under different encodings.
Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors.
arXiv Detail & Related papers (2023-02-04T11:40:40Z) - PCACE: A Statistical Approach to Ranking Neurons for CNN
Interpretability [1.0742675209112622]
We present a new statistical method for ranking the hidden neurons in any convolutional layer of a network.
We show a real-world application of our method to air pollution prediction with street-level images.
arXiv Detail & Related papers (2021-12-31T17:54:57Z) - Detecting Modularity in Deep Neural Networks [8.967870619902211]
We consider the problem of assessing the modularity exhibited by a partitioning of a network's neurons.
We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs.
We show that these partitionings, even ones based only on weights, reveal groups of neurons that are important and coherent.
arXiv Detail & Related papers (2021-10-13T20:33:30Z) - Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability.
We show the most important principal components provide more complete and interpretable explanations than the most important neurons.
Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z) - Stability of Algebraic Neural Networks to Small Perturbations [179.55535781816343]
Algebraic neural networks (AlgNNs) are composed of a cascade of layers each one associated to and algebraic signal model.
We show how any architecture that uses a formal notion of convolution can be stable beyond particular choices of the shift operator.
arXiv Detail & Related papers (2020-10-22T09:10:16Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z) - Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks.
We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias.
A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.