Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests
- URL: http://arxiv.org/abs/2511.23212v1
- Date: Fri, 28 Nov 2025 14:18:05 GMT
- Title: Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests
- Authors: Tomoshige Nakamura, Hiroshi Shiraishi,
- Abstract summary: We develop a theory for variable intrinsic importance defined as the difference in pinball loss risks.<n>We prove that in the bias-dominated regime ($ge 1/2$), standard inference breaks down as the estimator converges to a deterministic bias constant rather than a zero-mean normal distribution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantile Regression Forests (QRF) are widely used for non-parametric conditional quantile estimation, yet statistical inference for variable importance measures remains challenging due to the non-smoothness of the loss function and the complex bias-variance trade-off. In this paper, we develop a asymptotic theory for variable importance defined as the difference in pinball loss risks. We first establish the asymptotic normality of the QRF estimator by handling the non-differentiable pinball loss via Knight's identity. Second, we uncover a "phase transition" phenomenon governed by the subsampling rate $β$ (where $s \asymp n^β$). We prove that in the bias-dominated regime ($β\ge 1/2$), which corresponds to large subsample sizes typically favored in practice to maximize predictive accuracy, standard inference breaks down as the estimator converges to a deterministic bias constant rather than a zero-mean normal distribution. Finally, we derive the explicit analytic form of this asymptotic bias and discuss the theoretical feasibility of restoring valid inference via analytic bias correction. Our results highlight a fundamental trade-off between predictive performance and inferential validity, providing a theoretical foundation for understanding the intrinsic limitations of random forest inference in high-dimensional settings.
Related papers
- The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z) - Learning bounds for doubly-robust covariate shift adaptation [8.24901041136559]
Distribution shift between the training domain and the test domain poses a key challenge for machine learning.<n> doubly-robust (DR) estimator combines density ratio estimation with a pilot regression model.<n>This paper establishes the first non-asymptotic learning bounds for the DR estimator.
arXiv Detail & Related papers (2025-11-14T06:46:23Z) - Understanding Robust Machine Learning for Nonparametric Regression with Heavy-Tailed Noise [10.844819221753042]
We use Huber regression as a close-up example within Tikhonov-regularized risk minimization.<n>We address two central challenges: (i) the breakdown of standard concentration tools under weak moment assumptions, and (ii) the analytical difficulties introduced by unbounded hypothesis spaces.<n>Our study delivers principled rules, extends beyond Huber to other robust losses, and highlights prediction error, not excess risk, as the fundamental lens for analyzing robust learning.
arXiv Detail & Related papers (2025-10-10T21:57:18Z) - Disentangled Feature Importance [0.0]
We introduce emphDisentangled Feature Importance (DFI), a nonparametric generalization of the classical $R2$ decomposition via optimal transport.<n>DFI correlated features into independent latent variables using a transport map, eliminating correlation distortion.<n>DFI provides a principled decomposition of importance scores that sum to the total predictive variability for latent additive models.
arXiv Detail & Related papers (2025-06-30T20:54:48Z) - A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Variational Bayesian Neural Networks via Resolution of Singularities [1.2183405753834562]
We advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs)
We lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective.
We use the SLT-corrected form for singular posterior distributions to inform the design of the variational family itself.
arXiv Detail & Related papers (2023-02-13T00:32:49Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.