Evaluating Double Descent in Machine Learning: Insights from Tree-Based Models Applied to a Genomic Prediction Task
- URL: http://arxiv.org/abs/2509.25216v1
- Date: Mon, 22 Sep 2025 16:41:31 GMT
- Title: Evaluating Double Descent in Machine Learning: Insights from Tree-Based Models Applied to a Genomic Prediction Task
- Authors: Guillermo ComesaƱa Cimadevila,
- Abstract summary: Recent work has introduced the notion of a second descent in test error beyond the threshold-giving rise to the so-called double descent phenomenon.<n>We show that double descent consistently emerges only when complexity is scaled jointly across two axes.<n>Our findings underscore the importance of treating model complexity as a multidimensional construct when analysing generalisation behaviour.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical learning theory describes a well-characterised U-shaped relationship between model complexity and prediction error, reflecting a transition from underfitting in underparameterised regimes to overfitting as complexity grows. Recent work, however, has introduced the notion of a second descent in test error beyond the interpolation threshold-giving rise to the so-called double descent phenomenon. While double descent has been studied extensively in the context of deep learning, it has also been reported in simpler models, including decision trees and gradient boosting. In this work, we revisit these claims through the lens of classical machine learning applied to a biological classification task: predicting isoniazid resistance in Mycobacterium tuberculosis using whole-genome sequencing data. We systematically vary model complexity along two orthogonal axes-learner capacity (e.g., Pleaf, Pboost) and ensemble size (i.e., Pens)-and show that double descent consistently emerges only when complexity is scaled jointly across these axes. When either axis is held fixed, generalisation behaviour reverts to classical U- or L-shaped patterns. These results are replicated on a synthetic benchmark and support the unfolding hypothesis, which attributes double descent to the projection of distinct generalisation regimes onto a single complexity axis. Our findings underscore the importance of treating model complexity as a multidimensional construct when analysing generalisation behaviour. All code and reproducibility materials are available at: https://github.com/guillermocomesanacimadevila/Demystifying-Double-Descent-in-ML.
Related papers
- The Double Descent Behavior in Two Layer Neural Network for Binary Classification [46.3107850275261]
Recent studies observed a surprising concept on model test error called the double descent phenomenon.<n>Our aim is to observe and investigate the mathematical theory behind the double descent behavior of model test error for varying model sizes.
arXiv Detail & Related papers (2025-04-27T20:29:24Z) - Towards understanding epoch-wise double descent in two-layer linear neural networks [11.210628847081097]
We study epoch-wise double descent in two-layer linear neural networks.
We identify additional factors of epoch-wise double descent emerging with the extra model layer.
This opens up for further questions regarding unidentified factors of epoch-wise double descent for truly deep models.
arXiv Detail & Related papers (2024-07-13T10:45:21Z) - Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Asymptotics of Ridge Regression in Convolutional Models [26.910291664252973]
We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime.
We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.
arXiv Detail & Related papers (2021-03-08T05:56:43Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.