A self consistent theory of Gaussian Processes captures feature learning
effects in finite CNNs
- URL: http://arxiv.org/abs/2106.04110v1
- Date: Tue, 8 Jun 2021 05:20:00 GMT
- Title: A self consistent theory of Gaussian Processes captures feature learning
effects in finite CNNs
- Authors: Gadi Naveh and Zohar Ringel
- Abstract summary: Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently.
Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning.
Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) in the infinite width/channel limit have received
much attention recently, as they provide a clear analytical window to deep
learning via mappings to Gaussian Processes (GPs). Despite its theoretical
appeal, this viewpoint lacks a crucial ingredient of deep learning in finite
DNNs, laying at the heart of their success -- feature learning. Here we
consider DNNs trained with noisy gradient descent on a large training set and
derive a self consistent Gaussian Process theory accounting for strong
finite-DNN and feature learning effects. Applying this to a toy model of a
two-layer linear convolutional neural network (CNN) shows good agreement with
experiments. We further identify, both analytical and numerically, a sharp
transition between a feature learning regime and a lazy learning regime in this
model. Strong finite-DNN effects are also derived for a non-linear two-layer
fully connected network. Our self consistent theory provides a rich and
versatile analytical framework for studying feature learning and other non-lazy
effects in finite DNNs.
Related papers
- Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - Quantum-Inspired Tensor Neural Networks for Partial Differential
Equations [5.963563752404561]
Deep learning methods are constrained by training time and memory. To tackle these shortcomings, we implement Neural Networks (TNN)
We demonstrate that TNN provide significant parameter savings while attaining the same accuracy as compared to the classical Neural Network (DNN)
arXiv Detail & Related papers (2022-08-03T17:41:11Z) - Piecewise Linear Neural Networks and Deep Learning [27.02556725989978]
PieceWise Linear Neural Networks (PWLNNs) have proven successful in various fields, most recently in deep learning.
In 1977, the canonical representation pioneered the works of shallow PWLNNs learned by incremental designs.
In 2010, the Rectified Linear Unit (ReLU) advocated the prevalence of PWLNNs in deep learning.
arXiv Detail & Related papers (2022-06-18T08:41:42Z) - Linear Leaky-Integrate-and-Fire Neuron Model Based Spiking Neural
Networks and Its Mapping Relationship to Deep Neural Networks [7.840247953745616]
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability.
This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs)
arXiv Detail & Related papers (2022-05-31T17:02:26Z) - Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network.
In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters.
In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers.
We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Predicting the outputs of finite deep neural networks trained with noisy
gradients [1.1470070927586014]
A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs)
Here we consider a DNN training protocol involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian process.
An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width.
arXiv Detail & Related papers (2020-04-02T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.