Related papers: A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs

URL: http://arxiv.org/abs/2106.04110v1
Date: Tue, 8 Jun 2021 05:20:00 GMT
Title: A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs
Authors: Gadi Naveh and Zohar Ringel
Abstract summary: Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently. Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
Score: 2.28438857884398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently, as they provide a clear analytical window to deep learning via mappings to Gaussian Processes (GPs). Despite its theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. Applying this to a toy model of a two-layer linear convolutional neural network (CNN) shows good agreement with experiments. We further identify, both analytical and numerically, a sharp transition between a feature learning regime and a lazy learning regime in this model. Strong finite-DNN effects are also derived for a non-linear two-layer fully connected network. Our self consistent theory provides a rich and versatile analytical framework for studying feature learning and other non-lazy effects in finite DNNs.

Related papers

A unified theory of feature learning in RNNs and DNNs [2.5040739362611952]
Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning.<n>How does this structural similarity fit with the distinct functional properties these networks exhibit?<n>We develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels.
arXiv Detail & Related papers (2026-02-17T14:06:34Z)
A Self-Ensemble Inspired Approach for Effective Training of Binary-Weight Spiking Neural Networks [66.80058515743468]
Training Spiking Neural Networks (SNNs) and Binary Neural Networks (BNNs) is challenging because of the non-differentiable spike generation function.<n>We present a novel perspective on the dynamics of SNNs and their close connection to BNNs through an analysis of the backpropagation process.<n>Specifically, we leverage a structure of multiple shortcuts and a knowledge distillation-based training technique to improve the training of (binary-weight) SNNs.
arXiv Detail & Related papers (2025-08-18T04:11:06Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning. This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
Quantum-Inspired Tensor Neural Networks for Partial Differential Equations [5.963563752404561]
Deep learning methods are constrained by training time and memory. To tackle these shortcomings, we implement Neural Networks (TNN) We demonstrate that TNN provide significant parameter savings while attaining the same accuracy as compared to the classical Neural Network (DNN)
arXiv Detail & Related papers (2022-08-03T17:41:11Z)
Piecewise Linear Neural Networks and Deep Learning [27.02556725989978]
PieceWise Linear Neural Networks (PWLNNs) have proven successful in various fields, most recently in deep learning. In 1977, the canonical representation pioneered the works of shallow PWLNNs learned by incremental designs. In 2010, the Rectified Linear Unit (ReLU) advocated the prevalence of PWLNNs in deep learning.
arXiv Detail & Related papers (2022-06-18T08:41:42Z)
Linear Leaky-Integrate-and-Fire Neuron Model Based Spiking Neural Networks and Its Mapping Relationship to Deep Neural Networks [7.840247953745616]
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability. This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs)
arXiv Detail & Related papers (2022-05-31T17:02:26Z)
Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network. In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z)
Reinforcement Learning with External Knowledge by using Logical Neural Networks [67.46162586940905]
A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. We propose an integrated method that enables model-free reinforcement learning from external knowledge sources.
arXiv Detail & Related papers (2021-03-03T12:34:59Z)
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution. Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
Predicting the outputs of finite deep neural networks trained with noisy gradients [1.1470070927586014]
A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs) Here we consider a DNN training protocol involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian process. An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width.
arXiv Detail & Related papers (2020-04-02T18:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.