A Toy Model of Universality: Reverse Engineering How Networks Learn
Group Operations
- URL: http://arxiv.org/abs/2302.03025v2
- Date: Wed, 24 May 2023 22:13:13 GMT
- Title: A Toy Model of Universality: Reverse Engineering How Networks Learn
Group Operations
- Authors: Bilal Chughtai, Lawrence Chan, Neel Nanda
- Abstract summary: We study the universality hypothesis by examining how small neural networks learn to implement group composition.
We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Universality is a key hypothesis in mechanistic interpretability -- that
different models learn similar features and circuits when trained on similar
tasks. In this work, we study the universality hypothesis by examining how
small neural networks learn to implement group composition. We present a novel
algorithm by which neural networks may implement composition for any finite
group via mathematical representation theory. We then show that networks
consistently learn this algorithm by reverse engineering model logits and
weights, and confirm our understanding using ablations. By studying networks of
differing architectures trained on various groups, we find mixed evidence for
universality: using our algorithm, we can completely characterize the family of
circuits and features that networks learn on this task, but for a given network
the precise circuits learned -- as well as the order they develop -- are
arbitrary.
Related papers
- Grokking Group Multiplication with Cosets [10.255744802963926]
Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end.
We completely reverse engineer fully connected one-hidden layer networks that have grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$.
We relate how we reverse engineered the model's mechanisms and confirm our theory was a faithful description of the circuit's functionality.
arXiv Detail & Related papers (2023-12-11T18:12:18Z) - Image segmentation with traveling waves in an exactly solvable recurrent
neural network [71.74150501418039]
We show that a recurrent neural network can effectively divide an image into groups according to a scene's structural characteristics.
We present a precise description of the mechanism underlying object segmentation in this network.
We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images.
arXiv Detail & Related papers (2023-11-28T16:46:44Z) - Feature emergence via margin maximization: case studies in algebraic
tasks [4.401622714202886]
We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups.
More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
arXiv Detail & Related papers (2023-11-13T18:56:33Z) - The Clock and the Pizza: Two Stories in Mechanistic Explanation of
Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex.
We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z) - The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics.
We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning.
Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Learning Dynamics and Structure of Complex Systems Using Graph Neural
Networks [13.509027957413409]
We trained graph neural networks to fit time series from an example nonlinear dynamical system.
We found simple interpretations of the learned representation and model components.
We successfully identified a graph translator' between the statistical interactions in belief propagation and parameters of the corresponding trained network.
arXiv Detail & Related papers (2022-02-22T15:58:16Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks.
In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.