Related papers: Understanding the Initial Condensation of Convolutional Neural Networks

Understanding the Initial Condensation of Convolutional Neural Networks

URL: http://arxiv.org/abs/2305.09947v1
Date: Wed, 17 May 2023 05:00:47 GMT
Title: Understanding the Initial Condensation of Convolutional Neural Networks
Authors: Zhangchen Zhou, Hanxu Zhou, Yuqing Li, Zhi-Qin John Xu
Abstract summary: kernels of two-layer convolutional neural networks converge to one or a few directions during training. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.
Score: 6.451914896767135
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training. This phenomenon refers to the input weights of hidden neurons condensing into isolated orientations during training, revealing an implicit bias towards simple solutions in the parameter space. However, the impact of neural network structure on condensation has not been investigated yet. In this study, we focus on the investigation of convolutional neural networks (CNNs). Our experiments suggest that when subjected to small initialization and gradient-based training methods, kernel weights within the same CNN layer also cluster together during training, demonstrating a significant degree of condensation. Theoretically, we demonstrate that in a finite training period, kernels of a two-layer CNN with small initialization will converge to one or a few directions. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.

Related papers

Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy [0.0]
We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) We show that a single epoch of training is sufficient to predict the trainability of the deep feedforward network on a range of datasets.
arXiv Detail & Related papers (2024-06-13T18:00:05Z)
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations [2.310288676109785]
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks. The weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker points.
arXiv Detail & Related papers (2024-03-12T23:17:32Z)
On the dynamics of three-layer neural networks: initial condensation [2.022855152231054]
condensation occurs when gradient methods spontaneously reduce the complexity of neural networks. We establish the blow-up property of effective dynamics and present a sufficient condition for the occurrence of condensation. We also explore the association between condensation and the low-rank bias observed in deep matrix factorization.
arXiv Detail & Related papers (2024-02-25T02:36:14Z)
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs [5.7166378791349315]
We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network converges to zero loss. We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm.
arXiv Detail & Related papers (2023-06-10T16:36:22Z)
Phase Diagram of Initial Condensation for Two-layer Neural Networks [4.404198015660192]
We present a phase diagram of initial condensation for two-layer neural networks. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks.
arXiv Detail & Related papers (2023-03-12T03:55:38Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training [1.1958610985612828]
We show that the singularity of the activation function at original point is a key factor to understanding the condensation at initial training stage. Our experiments suggest that the maximal number of condensed orientations is twice of the singularity order.
arXiv Detail & Related papers (2021-05-25T05:47:55Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.