Non-asymptotic Excess Risk Bounds for Classification with Deep
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2105.00292v1
- Date: Sat, 1 May 2021 15:55:04 GMT
- Title: Non-asymptotic Excess Risk Bounds for Classification with Deep
Convolutional Neural Networks
- Authors: Guohao Shen, Yuling Jiao, Yuanyuan Lin and Jian Huang
- Abstract summary: We consider the problem of binary classification with a class of general deep convolutional neural networks.
We define the prefactors of the risk bounds in terms of the input data dimension and other model parameters.
We show that the classification methods with CNNs can circumvent the curse of dimensionality.
- Score: 6.051520664893158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we consider the problem of binary classification with a class
of general deep convolutional neural networks, which includes fully-connected
neural networks and fully convolutional neural networks as special cases. We
establish non-asymptotic excess risk bounds for a class of convex surrogate
losses and target functions with different modulus of continuity. An important
feature of our results is that we clearly define the prefactors of the risk
bounds in terms of the input data dimension and other model parameters and show
that they depend polynomially on the dimensionality in some important models.
We also show that the classification methods with CNNs can circumvent the curse
of dimensionality if the input data is supported on an approximate
low-dimensional manifold. To establish these results, we derive an upper bound
for the covering number for the class of general convolutional neural networks
with a bias term in each convolutional layer, and derive new results on the
approximation power of CNNs for any uniformly-continuous target functions.
These results provide further insights into the complexity and the
approximation power of general convolutional neural networks, which are of
independent interest and may have other applications. Finally, we apply our
general results to analyze the non-asymptotic excess risk bounds for four
widely used methods with different loss functions using CNNs, including the
least squares, the logistic, the exponential and the SVM hinge losses.
Related papers
- Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks.
We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations.
We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z) - Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss [2.07180164747172]
We compare deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers.
We find that a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs.
arXiv Detail & Related papers (2024-01-31T20:10:10Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Critical Initialization of Wide and Deep Neural Networks through Partial
Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$.
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Towards a mathematical framework to inform Neural Network modelling via
Polynomial Regression [0.0]
It is shown that almost identical predictions can be made when certain conditions are met locally.
When learning from generated data, the proposed method producess that approximate correctly the data locally.
arXiv Detail & Related papers (2021-02-07T17:56:16Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.