Related papers: Feature emergence via margin maximization: case studies in algebraic tasks

Feature emergence via margin maximization: case studies in algebraic tasks

URL: http://arxiv.org/abs/2311.07568v2
Date: Mon, 19 Feb 2024 17:59:29 GMT
Title: Feature emergence via margin maximization: case studies in algebraic tasks
Authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade
Abstract summary: We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
Score: 4.401622714202886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.

Related papers

Unveiling the Potential of Superexpressive Networks in Implicit Neural Representations [23.561956415242584]
Superexpressive networks employ a specialized network structure characterized by having an additional dimension, namely width, depth, and height'' We show that superexpressive networks can surpass recent implicit neural representations that use highly-specialized nonlinear activation functions.
arXiv Detail & Related papers (2025-03-27T05:36:12Z)
Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry [7.517013801971377]
We introduce an analysis framework based on representational geometry to study feature learning. We find that when a network learns features useful for solving a task, the task-relevant manifold become increasingly untangled. By tracking changes in the underlying manifold geometry, we uncover distinct learning stages throughout training.
arXiv Detail & Related papers (2025-03-23T15:39:56Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet) ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z)
Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. We construct a class of surrogate models for neural networks using Gaussian processes. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference. We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks. In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.