Related papers: A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

URL: http://arxiv.org/abs/2302.03025v2
Date: Wed, 24 May 2023 22:13:13 GMT
Title: A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Authors: Bilal Chughtai, Lawrence Chan, Neel Nanda
Abstract summary: We study the universality hypothesis by examining how small neural networks learn to implement group composition. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study the universality hypothesis by examining how small neural networks learn to implement group composition. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory. We then show that networks consistently learn this algorithm by reverse engineering model logits and weights, and confirm our understanding using ablations. By studying networks of differing architectures trained on various groups, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned -- as well as the order they develop -- are arbitrary.

Related papers

Grokking Group Multiplication with Cosets [10.255744802963926]
Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. We completely reverse engineer fully connected one-hidden layer networks that have grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. We relate how we reverse engineered the model's mechanisms and confirm our theory was a faithful description of the circuit's functionality.
arXiv Detail & Related papers (2023-12-11T18:12:18Z)
Image segmentation with traveling waves in an exactly solvable recurrent neural network [71.74150501418039]
We show that a recurrent neural network can effectively divide an image into groups according to a scene's structural characteristics. We present a precise description of the mechanism underlying object segmentation in this network. We then demonstrate a simple algorithm for object segmentation that generalizes across inputs ranging from simple geometric objects in grayscale images to natural images.
arXiv Detail & Related papers (2023-11-28T16:46:44Z)
Feature emergence via margin maximization: case studies in algebraic tasks [4.401622714202886]
We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
arXiv Detail & Related papers (2023-11-13T18:56:33Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks [12.130628846129973]
We introduce the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures.
arXiv Detail & Related papers (2022-07-21T12:01:03Z)
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants. Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z)
Learning Dynamics and Structure of Complex Systems Using Graph Neural Networks [13.509027957413409]
We trained graph neural networks to fit time series from an example nonlinear dynamical system. We found simple interpretations of the learned representation and model components. We successfully identified a graph translator' between the statistical interactions in belief propagation and parameters of the corresponding trained network.
arXiv Detail & Related papers (2022-02-22T15:58:16Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks. In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.