Analysis of the rate of convergence of an over-parametrized   convolutional neural network image classifier learned by gradient descent
        - URL: http://arxiv.org/abs/2405.07619v1
- Date: Mon, 13 May 2024 10:26:28 GMT
- Title: Analysis of the rate of convergence of an over-parametrized   convolutional neural network image classifier learned by gradient descent
- Authors: Michael Kohler, Adam Krzyzak, Benjamin Walter, 
- Abstract summary: Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered.
A gradient bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate is derived.
- Score: 9.4491536689161
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract:   Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered. The weights of the network are learned by gradient descent. A bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate and the minimal possible value is derived. 
 
      
        Related papers
        - Convolutional Spiking Neural Network for Image Classification [0.0]
 We consider an implementation of convolutional architecture in a spiking neural network (SNN) used to classify images.<n>As in the traditional neural network, the convolutional layers form informational "features" used as predictors in the SNN-based classifier with CoLaNET architecture.
 arXiv  Detail & Related papers  (2025-05-13T12:47:13Z)
- Graph Neural Networks for Learning Equivariant Representations of Neural   Networks [55.04145324152541]
 We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
 arXiv  Detail & Related papers  (2024-03-18T18:01:01Z)
- Early Directional Convergence in Deep Homogeneous Neural Networks for
  Small Initializations [2.310288676109785]
 This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks.
The weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker points.
 arXiv  Detail & Related papers  (2024-03-12T23:17:32Z)
- On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
 We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks.
We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
 arXiv  Detail & Related papers  (2023-09-26T17:14:10Z)
- Reparameterization through Spatial Gradient Scaling [69.27487006953852]
 Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
 arXiv  Detail & Related papers  (2023-03-05T17:57:33Z)
- Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
 We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
 arXiv  Detail & Related papers  (2023-02-01T03:18:07Z)
- Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
 In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
 arXiv  Detail & Related papers  (2022-10-13T15:09:54Z)
- Limitations of neural network training due to numerical instability of
  backpropagation [2.255961793913651]
 We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute gradients.
It is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers.
We conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences.
 arXiv  Detail & Related papers  (2022-10-03T10:34:38Z)
- On the Effective Number of Linear Regions in Shallow Univariate ReLU
  Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
 We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
 arXiv  Detail & Related papers  (2022-05-18T16:57:10Z)
- Approximation bounds for norm constrained neural networks with
  applications to regression and GANs [9.645327615996914]
 We prove upper and lower bounds on the approximation error of ReLU neural networks with norm constraint on the weights.
We apply these approximation bounds to analyze the convergences of regression using norm constrained neural networks and distribution estimation by GANs.
 arXiv  Detail & Related papers  (2022-01-24T02:19:05Z)
- Local Critic Training for Model-Parallel Learning of Deep Neural
  Networks [94.69202357137452]
 We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
 arXiv  Detail & Related papers  (2021-02-03T09:30:45Z)
- Optimal Rates for Averaged Stochastic Gradient Descent under Neural
  Tangent Kernel Regime [50.510421854168065]
 We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
 arXiv  Detail & Related papers  (2020-06-22T14:31:37Z)
- Improving correlation method with convolutional neural networks [0.0]
 We present a convolutional neural network for the classification of correlation responses obtained by correlation filters.
The proposed approach can improve the accuracy of classification, as well as achieve invariance to the image classes and parameters.
 arXiv  Detail & Related papers  (2020-04-20T16:36:01Z)
- On the rate of convergence of image classifiers based on convolutional
  neural networks [0.0]
 The rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed.
This proves that in image classification it is possible to circumvent the curse of dimensionality by convolutional neural networks.
 arXiv  Detail & Related papers  (2020-03-03T14:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.