Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent
- URL: http://arxiv.org/abs/2405.07619v1
- Date: Mon, 13 May 2024 10:26:28 GMT
- Title: Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent
- Authors: Michael Kohler, Adam Krzyzak, Benjamin Walter,
- Abstract summary: Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered.
A gradient bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate is derived.
- Score: 9.4491536689161
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered. The weights of the network are learned by gradient descent. A bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate and the minimal possible value is derived.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Early Directional Convergence in Deep Homogeneous Neural Networks for
Small Initializations [2.310288676109785]
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks.
The weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker points.
arXiv Detail & Related papers (2024-03-12T23:17:32Z) - On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks.
We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
arXiv Detail & Related papers (2023-09-26T17:14:10Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Limitations of neural network training due to numerical instability of
backpropagation [2.255961793913651]
We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute gradients.
It is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers.
We conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences.
arXiv Detail & Related papers (2022-10-03T10:34:38Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Improving correlation method with convolutional neural networks [0.0]
We present a convolutional neural network for the classification of correlation responses obtained by correlation filters.
The proposed approach can improve the accuracy of classification, as well as achieve invariance to the image classes and parameters.
arXiv Detail & Related papers (2020-04-20T16:36:01Z) - On the rate of convergence of image classifiers based on convolutional
neural networks [0.0]
The rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed.
This proves that in image classification it is possible to circumvent the curse of dimensionality by convolutional neural networks.
arXiv Detail & Related papers (2020-03-03T14:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.