Sharp Rate of Convergence for Deep Neural Network Classifiers under the
Teacher-Student Setting
- URL: http://arxiv.org/abs/2001.06892v2
- Date: Sat, 1 Feb 2020 04:58:57 GMT
- Title: Sharp Rate of Convergence for Deep Neural Network Classifiers under the
Teacher-Student Setting
- Authors: Tianyang Hu, Zuofeng Shang, Guang Cheng
- Abstract summary: neural networks handle large-scale high dimensional data, such as facial images from computer vision, extremely well.
In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk.
- Score: 20.198224461384854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifiers built with neural networks handle large-scale high dimensional
data, such as facial images from computer vision, extremely well while
traditional statistical methods often fail miserably. In this paper, we attempt
to understand this empirical success in high dimensional classification by
deriving the convergence rates of excess risk. In particular, a teacher-student
framework is proposed that assumes the Bayes classifier to be expressed as ReLU
neural networks. In this setup, we obtain a sharp rate of convergence, i.e.,
$\tilde{O}_d(n^{-2/3})$, for classifiers trained using either 0-1 loss or hinge
loss. This rate can be further improved to $\tilde{O}_d(n^{-1})$ when the data
distribution is separable. Here, $n$ denotes the sample size. An interesting
observation is that the data dimension only contributes to the $\log(n)$ term
in the above rates. This may provide one theoretical explanation for the
empirical successes of deep neural networks in high dimensional classification,
particularly for structured data.
Related papers
- On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks.
We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
arXiv Detail & Related papers (2023-09-26T17:14:10Z) - Wide and Deep Neural Networks Achieve Optimality for Classification [23.738242876364865]
We identify and construct an explicit set of neural network classifiers that achieve optimality.
In particular, we provide explicit activation functions that can be used to construct networks that achieve optimality.
Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
arXiv Detail & Related papers (2022-04-29T14:27:42Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.