Large Deviations for Classification Performance Analysis of Machine
Learning Systems
- URL: http://arxiv.org/abs/2301.07104v1
- Date: Mon, 16 Jan 2023 10:48:12 GMT
- Title: Large Deviations for Classification Performance Analysis of Machine
Learning Systems
- Authors: Paolo Braca, Leonardo M. Millefiori, Augusto Aubry, Antonio De Maio,
Peter Willett
- Abstract summary: We show that under appropriate conditions the error probabilities vanish exponentially, as $sim expleft(-n,I + o(n) right)$, where $I$ is the error rate and $n$ is the number of observations available for testing.
The theoretical findings are finally tested using the popular MNIST dataset.
- Score: 16.74271332025289
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the performance of machine learning binary classification techniques
in terms of error probabilities. The statistical test is based on the
Data-Driven Decision Function (D3F), learned in the training phase, i.e., what
is thresholded before the final binary decision is made. Based on large
deviations theory, we show that under appropriate conditions the classification
error probabilities vanish exponentially, as $\sim \exp\left(-n\,I + o(n)
\right)$, where $I$ is the error rate and $n$ is the number of observations
available for testing. We also propose two different approximations for the
error probability curves, one based on a refined asymptotic formula (often
referred to as exact asymptotics), and another one based on the central limit
theorem. The theoretical findings are finally tested using the popular MNIST
dataset.
Related papers
- A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler.
We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance.
Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Improved Analysis of Score-based Generative Modeling: User-Friendly
Bounds under Minimal Smoothness Assumptions [9.953088581242845]
We provide convergence guarantees with complexity for any data distribution with second-order moment.
Our result does not rely on any log-concavity or functional inequality assumption.
Our theoretical analysis provides comparison between different discrete approximations and may guide the choice of discretization points in practice.
arXiv Detail & Related papers (2022-11-03T15:51:00Z) - Asymptotic Statistical Analysis of $f$-divergence GAN [13.587087960403199]
Generative Adversarial Networks (GANs) have achieved great success in data generation.
We consider the statistical behavior of the general $f$-divergence formulation of GAN.
The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE)
arXiv Detail & Related papers (2022-09-14T18:08:37Z) - Statistical Hypothesis Testing Based on Machine Learning: Large
Deviations Analysis [15.605887551756933]
We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques.
We provide the mathematical conditions for a ML to exhibit error probabilities that vanish exponentially, say $sim expleft(-n,I + o(n) right)
In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training.
arXiv Detail & Related papers (2022-07-22T08:30:10Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Error bounds in estimating the out-of-sample prediction error using
leave-one-out cross validation in high-dimensions [19.439945058410203]
We study the problem of out-of-sample risk estimation in the high dimensional regime.
Extensive empirical evidence confirms the accuracy of leave-one-out cross validation.
One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
arXiv Detail & Related papers (2020-03-03T20:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.