Related papers: Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting

Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting

URL: http://arxiv.org/abs/2001.06892v2
Date: Sat, 1 Feb 2020 04:58:57 GMT
Title: Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting
Authors: Tianyang Hu, Zuofeng Shang, Guang Cheng
Abstract summary: neural networks handle large-scale high dimensional data, such as facial images from computer vision, extremely well. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk.
Score: 20.198224461384854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Classifiers built with neural networks handle large-scale high dimensional data, such as facial images from computer vision, extremely well while traditional statistical methods often fail miserably. In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk. In particular, a teacher-student framework is proposed that assumes the Bayes classifier to be expressed as ReLU neural networks. In this setup, we obtain a sharp rate of convergence, i.e., $\tilde{O}_d(n^{-2/3})$, for classifiers trained using either 0-1 loss or hinge loss. This rate can be further improved to $\tilde{O}_d(n^{-1})$ when the data distribution is separable. Here, $n$ denotes the sample size. An interesting observation is that the data dimension only contributes to the $\log(n)$ term in the above rates. This may provide one theoretical explanation for the empirical successes of deep neural networks in high dimensional classification, particularly for structured data.

Related papers

On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
arXiv Detail & Related papers (2023-09-26T17:14:10Z)
Wide and Deep Neural Networks Achieve Optimality for Classification [23.738242876364865]
We identify and construct an explicit set of neural network classifiers that achieve optimality. In particular, we provide explicit activation functions that can be used to construct networks that achieve optimality. Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
arXiv Detail & Related papers (2022-04-29T14:27:42Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification. Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$. The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin. We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.