Data-driven emergence of convolutional structure in neural networks
- URL: http://arxiv.org/abs/2202.00565v1
- Date: Tue, 1 Feb 2022 17:11:13 GMT
- Title: Data-driven emergence of convolutional structure in neural networks
- Authors: Alessandro Ingrosso and Sebastian Goldt
- Abstract summary: We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
- Score: 83.4920717252233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploiting data invariances is crucial for efficient learning in both
artificial and biological neural circuits. Understanding how neural networks
can discover appropriate representations capable of harnessing the underlying
symmetries of their inputs is thus crucial in machine learning and
neuroscience. Convolutional neural networks, for example, were designed to
exploit translation symmetry and their capabilities triggered the first wave of
deep learning successes. However, learning convolutions directly from
translation-invariant data with a fully-connected network has so far proven
elusive. Here, we show how initially fully-connected neural networks solving a
discrimination task can learn a convolutional structure directly from their
inputs, resulting in localised, space-tiling receptive fields. These receptive
fields match the filters of a convolutional network trained on the same task.
By carefully designing data models for the visual scene, we show that the
emergence of this pattern is triggered by the non-Gaussian, higher-order local
structure of the inputs, which has long been recognised as the hallmark of
natural images. We provide an analytical and numerical characterisation of the
pattern-formation mechanism responsible for this phenomenon in a simple model,
which results in an unexpected link between receptive field formation and the
tensor decomposition of higher-order input correlations. These results provide
a new perspective on the development of low-level feature detectors in various
sensory modalities, and pave the way for studying the impact of higher-order
statistics on learning in neural networks.
Related papers
- Collective variables of neural networks: empirical time evolution and scaling laws [0.535514140374842]
We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network.
Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
arXiv Detail & Related papers (2024-10-09T21:37:14Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Robust Generalization of Quadratic Neural Networks via Function
Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z) - Persistent Homology Captures the Generalization of Neural Networks
Without A Validation Set [0.0]
We suggest studying the training of neural networks with Algebraic Topology, specifically Persistent Homology.
Using simplicial complex representations of neural networks, we study the PH diagram distance evolution on the neural network learning process.
Results show that the PH diagram distance between consecutive neural network states correlates with the validation accuracy.
arXiv Detail & Related papers (2021-05-31T09:17:31Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.