Testing for Overfitting
- URL: http://arxiv.org/abs/2305.05792v1
- Date: Tue, 9 May 2023 22:49:55 GMT
- Title: Testing for Overfitting
- Authors: James Schmidt
- Abstract summary: We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High complexity models are notorious in machine learning for overfitting, a
phenomenon in which models well represent data but fail to generalize an
underlying data generating process. A typical procedure for circumventing
overfitting computes empirical risk on a holdout set and halts once (or flags
that/when) it begins to increase. Such practice often helps in outputting a
well-generalizing model, but justification for why it works is primarily
heuristic.
We discuss the overfitting problem and explain why standard asymptotic and
concentration results do not hold for evaluation with training data. We then
proceed to introduce and argue for a hypothesis test by means of which both
model performance may be evaluated using training data, and overfitting
quantitatively defined and detected. We rely on said concentration bounds which
guarantee that empirical means should, with high probability, approximate their
true mean to conclude that they should approximate each other. We stipulate
conditions under which this test is valid, describe how the test may be used
for identifying overfitting, articulate a further nuance according to which
distributional shift may be flagged, and highlight an alternative notion of
learning which usefully captures generalization in the absence of uniform PAC
guarantees.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - User-defined Event Sampling and Uncertainty Quantification in Diffusion
Models for Physical Dynamical Systems [49.75149094527068]
We show that diffusion models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems.
We develop a probabilistic approximation scheme for the conditional score function which converges to the true distribution as the noise level decreases.
We are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
arXiv Detail & Related papers (2023-06-13T03:42:03Z) - Intervention Generalization: A View from Factor Graph Models [7.117681268784223]
We take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system.
A postulated $textitinterventional factor model$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms.
arXiv Detail & Related papers (2023-06-06T21:44:23Z) - Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations.
Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z) - Evaluating Causal Inference Methods [0.4588028371034407]
We introduce a deep generative model-based framework, Credence, to validate causal inference methods.
Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods.
arXiv Detail & Related papers (2022-02-09T00:21:22Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Testing for Typicality with Respect to an Ensemble of Learned
Distributions [5.850572971372637]
One-sample approaches to the goodness-of-fit problem offer significant computational advantages for online testing.
The ability to correctly reject anomalous data in this setting hinges on the accuracy of the model of the base distribution.
Existing methods for the one-sample goodness-of-fit problem do not account for the fact that a model of the base distribution is learned.
We propose training an ensemble of density models, considering data to be anomalous if the data is anomalous with respect to any member of the ensemble.
arXiv Detail & Related papers (2020-11-11T19:47:46Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.