Related papers: Emergent Symbol-like Number Variables in Artificial Neural Networks

Emergent Symbol-like Number Variables in Artificial Neural Networks

URL: http://arxiv.org/abs/2501.06141v1
Date: Fri, 10 Jan 2025 18:03:46 GMT
Title: Emergent Symbol-like Number Variables in Artificial Neural Networks
Authors: Satchel Grant, Noah D. Goodman, James L. McClelland,
Abstract summary: We show that artificial neural models do indeed develop analogs of interchangeable, mutable, latent number variables.<n>We then show how the symbol-like variables change over the course of training to find a strong correlation between the models' task performance and the alignment of their symbol-like representations.<n>Finally, we show that in all cases, some degree of gradience exists in these neural symbols, highlighting the difficulty of finding simple, interpretable symbolic stories of how neural networks perform numeric tasks.
Score: 34.388552536773034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: What types of numeric representations emerge in Neural Networks (NNs)? To what degree do NNs induce abstract, mutable, slot-like numeric variables, and in what situations do these representations emerge? How do these representations change over learning, and how can we understand the neural implementations in ways that are unified across different NNs? In this work, we approach these questions by first training sequence based neural systems using Next Token Prediction (NTP) objectives on numeric tasks. We then seek to understand the neural solutions through the lens of causal abstractions or symbolic algorithms. We use a combination of causal interventions and visualization methods to find that artificial neural models do indeed develop analogs of interchangeable, mutable, latent number variables purely from the NTP objective. We then ask how variations on the tasks and model architectures affect the models' learned solutions to find that these symbol-like numeric representations do not form for every variant of the task, and transformers solve the problem in a notably different way than their recurrent counterparts. We then show how the symbol-like variables change over the course of training to find a strong correlation between the models' task performance and the alignment of their symbol-like representations. Lastly, we show that in all cases, some degree of gradience exists in these neural symbols, highlighting the difficulty of finding simple, interpretable symbolic stories of how neural networks perform numeric tasks. Taken together, our results are consistent with the view that neural networks can approximate interpretable symbolic programs of number cognition, but the particular program they approximate and the extent to which they approximate it can vary widely, depending on the network architecture, training data, extent of training, and network size.

Related papers

From Neurons to Neutrons: A Case Study in Interpretability [5.242869847419834]
We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions. This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it.
arXiv Detail & Related papers (2024-05-27T17:59:35Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features. We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z)
Sparse Autoencoders Find Highly Interpretable Features in Language Models [0.0]
Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. We use sparse autoencoders to reconstruct the internal activations of a language model. Our method may serve as a foundation for future mechanistic interpretability work.
arXiv Detail & Related papers (2023-09-15T17:56:55Z)
Transferability of coVariance Neural Networks and Application to Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks. We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs) VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
Invariants for neural automata [0.0]
We develop a formal framework for the investigation of symmetries and invariants of neural automata under different encodings. Our work could be of substantial importance for related regression studies of real-world measurements with neurosymbolic processors.
arXiv Detail & Related papers (2023-02-04T11:40:40Z)
PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability [1.0742675209112622]
We present a new statistical method for ranking the hidden neurons in any convolutional layer of a network. We show a real-world application of our method to air pollution prediction with street-level images.
arXiv Detail & Related papers (2021-12-31T17:54:57Z)
Detecting Modularity in Deep Neural Networks [8.967870619902211]
We consider the problem of assessing the modularity exhibited by a partitioning of a network's neurons. We propose two proxies for this: importance, which reflects how crucial sets of neurons are to network performance; and coherence, which reflects how consistently their neurons associate with features of the inputs. We show that these partitionings, even ones based only on weights, reveal groups of neurons that are important and coherent.
arXiv Detail & Related papers (2021-10-13T20:33:30Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability. We show the most important principal components provide more complete and interpretable explanations than the most important neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
Stability of Algebraic Neural Networks to Small Perturbations [179.55535781816343]
Algebraic neural networks (AlgNNs) are composed of a cascade of layers each one associated to and algebraic signal model. We show how any architecture that uses a formal notion of convolution can be stable beyond particular choices of the shift operator.
arXiv Detail & Related papers (2020-10-22T09:10:16Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy. We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.