A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs
- URL: http://arxiv.org/abs/2105.01867v1
- Date: Wed, 5 May 2021 05:14:08 GMT
- Title: A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs
- Authors: Devansh Bisla, Apoorva Nandini Saridena, Anna Choromanska
- Abstract summary: This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs)
We derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures.
- Score: 11.152761263415046
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper focuses on understanding how the generalization error scales with
the amount of the training data for deep neural networks (DNNs). Existing
techniques in statistical learning require computation of capacity measures,
such as VC dimension, to provably bound this error. It is however unclear how
to extend these measures to DNNs and therefore the existing analyses are
applicable to simple neural networks, which are not used in practice, e.g.,
linear or shallow ones or otherwise multi-layer perceptrons. Moreover, many
theoretical error bounds are not empirically verifiable. We derive estimates of
the generalization error that hold for deep networks and do not rely on
unattainable capacity measures. The enabling technique in our approach hinges
on two major assumptions: i) the network achieves zero training error, ii) the
probability of making an error on a test point is proportional to the distance
between this point and its nearest training point in the feature space and at a
certain maximal distance (that we call radius) it saturates. Based on these
assumptions we estimate the generalization error of DNNs. The obtained estimate
scales as O(1/(\delta N^{1/d})), where N is the size of the training data and
is parameterized by two quantities, the effective dimensionality of the data as
perceived by the network (d) and the aforementioned radius (\delta), both of
which we find empirically. We show that our estimates match with the
experimentally obtained behavior of the error on multiple learning tasks using
benchmark data-sets and realistic models. Estimating training data requirements
is essential for deployment of safety critical applications such as autonomous
driving etc. Furthermore, collecting and annotating training data requires a
huge amount of financial, computational and human resources. Our empirical
estimates will help to efficiently allocate resources.
Related papers
- Estimating Uncertainty with Implicit Quantile Network [0.0]
Uncertainty quantification is an important part of many performance critical applications.
This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks.
arXiv Detail & Related papers (2024-08-26T13:33:14Z) - Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Certified machine learning: A posteriori error estimation for
physics-informed neural networks [0.0]
PINNs are known to be robust for smaller training sets, derive better generalization problems, and are faster to train.
We show that using PINNs in comparison with purely data-driven neural networks is not only favorable for training performance but allows us to extract significant information on the quality of the approximated solution.
arXiv Detail & Related papers (2022-03-31T14:23:04Z) - HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep
Neural Networks [51.143054943431665]
We propose Hypergradient Data Relevance Analysis, or HYDRA, which interprets predictions made by deep neural networks (DNNs) as effects of their training data.
HYDRA assesses the contribution of training data toward test data points throughout the training trajectory.
In addition, we quantitatively demonstrate that HYDRA outperforms influence functions in accurately estimating data contribution and detecting noisy data labels.
arXiv Detail & Related papers (2021-02-04T10:00:13Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z) - The training accuracy of two-layer neural networks: its estimation and
understanding using random datasets [0.0]
We propose a novel theory based on space partitioning to estimate the approximate training accuracy for two-layer neural networks on random datasets without training.
Our method estimates the training accuracy for two-layer fully-connected neural networks on two-class random datasets using only three arguments.
arXiv Detail & Related papers (2020-10-26T07:21:29Z) - Simple and Principled Uncertainty Estimation with Deterministic Deep
Learning via Distance Awareness [24.473250414880454]
We study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN)
By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space.
We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs.
arXiv Detail & Related papers (2020-06-17T19:18:22Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.