Understanding the double descent curve in Machine Learning
- URL: http://arxiv.org/abs/2211.10322v1
- Date: Fri, 18 Nov 2022 16:27:05 GMT
- Title: Understanding the double descent curve in Machine Learning
- Authors: Luis Sa-Couto, Jose Miguel Ramos, Miguel Almeida, Andreas Wichert
- Abstract summary: We develop a principled understanding of the phenomenon, and sketch answers to important questions.
We report real experimental results that are correctly predicted by our proposed hypothesis.
- Score: 1.8065361710947976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The theory of bias-variance used to serve as a guide for model selection when
applying Machine Learning algorithms. However, modern practice has shown
success with over-parameterized models that were expected to overfit but did
not. This led to the proposal of the double descent curve of performance by
Belkin et al. Although it seems to describe a real, representative phenomenon,
the field is lacking a fundamental theoretical understanding of what is
happening, what are the consequences for model selection and when is double
descent expected to occur. In this paper we develop a principled understanding
of the phenomenon, and sketch answers to these important questions.
Furthermore, we report real experimental results that are correctly predicted
by our proposed hypothesis.
Related papers
- Class-wise Activation Unravelling the Engima of Deep Double Descent [0.0]
Double descent presents a counter-intuitive aspect within the machine learning domain.
In this study, we revisited the phenomenon of double descent and discussed the conditions of its occurrence.
arXiv Detail & Related papers (2024-05-13T12:07:48Z) - On the nonconvexity of some push-forward constraints and its
consequences in machine learning [0.0]
The push-forward operation enables one to redistribute a convex probability measure through a map.
It plays a key role in statistics and: many problems from optimal transport impact to push-forward.
This paper aims to help researchers better understand predictors or algorithmic learning problems.
arXiv Detail & Related papers (2024-03-12T10:06:48Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Theoretical and Practical Perspectives on what Influence Functions Do [45.35457212616306]
Influence functions (IF) have been seen as a technique for explaining model predictions through the lens of the training data.
Recent empirical studies have shown that the existing methods of estimating IF predict the leave-one-out-and-retrain effect poorly.
We show that while most assumptions can be addressed successfully, the parameter divergence poses a clear limitation on the predictive power of IF.
arXiv Detail & Related papers (2023-05-26T14:26:36Z) - A Mathematical Framework for Learning Probability Distributions [0.0]
generative modeling and density estimation has become an immensely popular subject in recent years.
This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles.
In particular, we prove that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality.
arXiv Detail & Related papers (2022-12-22T04:41:45Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.