An analytic theory of shallow networks dynamics for hinge loss
classification
- URL: http://arxiv.org/abs/2006.11209v1
- Date: Fri, 19 Jun 2020 16:25:29 GMT
- Title: An analytic theory of shallow networks dynamics for hinge loss
classification
- Authors: Franco Pellegrini and Giulio Biroli
- Abstract summary: We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
- Score: 14.323962459195771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have been shown to perform incredibly well in classification
tasks over structured high-dimensional datasets. However, the learning dynamics
of such networks is still poorly understood. In this paper we study in detail
the training dynamics of a simple type of neural network: a single hidden layer
trained to perform a classification task. We show that in a suitable mean-field
limit this case maps to a single-node learning problem with a time-dependent
dataset determined self-consistently from the average nodes population. We
specialize our theory to the prototypical case of a linearly separable dataset
and a linear hinge loss, for which the dynamics can be explicitly solved. This
allow us to address in a simple setting several phenomena appearing in modern
networks such as slowing down of training dynamics, crossover between rich and
lazy learning, and overfitting. Finally, we asses the limitations of mean-field
theory by studying the case of large but finite number of nodes and of training
samples.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Dynamic Analysis of Nonlinear Civil Engineering Structures using
Artificial Neural Network with Adaptive Training [2.1202971527014287]
In this study, artificial neural networks are developed with adaptive training algorithms.
The networks can successfully predict the time-history response of the shear frame and the rock structure to real ground motion records.
arXiv Detail & Related papers (2021-11-21T21:14:48Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Over-parametrized neural networks as under-determined linear systems [31.69089186688224]
We show that it is unsurprising simple neural networks can achieve zero training loss.
We show that kernels typically associated with the ReLU activation function have fundamental flaws.
We propose new activation functions that avoid the pitfalls of ReLU in that they admit zero training loss solutions for any set of distinct data points.
arXiv Detail & Related papers (2020-10-29T21:43:00Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - The Surprising Simplicity of the Early-Time Learning Dynamics of Neural
Networks [43.860358308049044]
In work, we show that these common perceptions can be completely false in the early phase of learning.
We argue that this surprising simplicity can persist in networks with more layers with convolutional architecture.
arXiv Detail & Related papers (2020-06-25T17:42:49Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Finding trainable sparse networks through Neural Tangent Transfer [16.092248433189816]
In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria.
In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner.
arXiv Detail & Related papers (2020-06-15T08:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.