Feature emergence via margin maximization: case studies in algebraic
tasks
- URL: http://arxiv.org/abs/2311.07568v2
- Date: Mon, 19 Feb 2024 17:59:29 GMT
- Title: Feature emergence via margin maximization: case studies in algebraic
tasks
- Authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao,
Sham Kakade
- Abstract summary: We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups.
More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
- Score: 4.401622714202886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the internal representations learned by neural networks is a
cornerstone challenge in the science of machine learning. While there have been
significant recent strides in some cases towards understanding how neural
networks implement specific target functions, this paper explores a
complementary question -- why do networks arrive at particular computational
strategies? Our inquiry focuses on the algebraic learning tasks of modular
addition, sparse parities, and finite group operations. Our primary theoretical
findings analytically characterize the features learned by stylized neural
networks for these algebraic tasks. Notably, our main technique demonstrates
how the principle of margin maximization alone can be used to fully specify the
features learned by the network. Specifically, we prove that the trained
networks utilize Fourier features to perform modular addition and employ
features corresponding to irreducible group-theoretic representations to
perform compositions in general groups, aligning closely with the empirical
observations of Nanda et al. and Chughtai et al. More generally, we hope our
techniques can help to foster a deeper understanding of why neural networks
adopt specific computational strategies.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet)
ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z) - Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks.
In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.