Related papers: Interpretable random forest models through forward variable selection

Interpretable random forest models through forward variable selection

URL: http://arxiv.org/abs/2005.05113v1
Date: Mon, 11 May 2020 13:56:49 GMT
Title: Interpretable random forest models through forward variable selection
Authors: Jasper Velthoen, Juan-Juan Cai, Geurt Jongbloed
Abstract summary: We develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. We demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

Related papers

Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z)
SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence. We transform the discrete selection problem into a continuous subset optimization framework. We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z)
DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting [14.390842560217743]
We propose a novel approach called DistPred for regression and forecasting tasks. We transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form. This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable.
arXiv Detail & Related papers (2024-06-17T10:33:00Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
Bagging in overparameterized learning: Risk characterization and risk monotonization [2.6534407766508177]
We study the prediction risk of variants of bagged predictors under the proportionals regime. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors.
arXiv Detail & Related papers (2022-10-20T17:45:58Z)
Learning from a Biased Sample [3.546358664345473]
We propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions. We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data.
arXiv Detail & Related papers (2022-09-05T04:19:16Z)
Lazy Estimation of Variable Importance for Large Neural Networks [22.95405462638975]
We propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We demonstrate our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
arXiv Detail & Related papers (2022-07-19T06:28:17Z)
Comparing two samples through stochastic dominance: a graphical approach [2.867517731896504]
Non-deterministic measurements are common in real-world scenarios. We propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions.
arXiv Detail & Related papers (2022-03-15T13:37:03Z)
Variational Bayes for high-dimensional proportional hazards models with applications to gene expression variable selection [3.8761064607384195]
We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC. We demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes.
arXiv Detail & Related papers (2021-12-19T22:10:41Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak. Standard methods struggle to accommodate the partial observability and sparse data common at finer scales. We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z)
Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation. Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z)
A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.