Sequential Permutation Testing of Random Forest Variable Importance
Measures
- URL: http://arxiv.org/abs/2206.01284v1
- Date: Thu, 2 Jun 2022 20:16:50 GMT
- Title: Sequential Permutation Testing of Random Forest Variable Importance
Measures
- Authors: Alexander Hapfelmeier, Roman Hornung, Bernhard Haller
- Abstract summary: It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests.
The results of simulation studies confirm that the theoretical properties of the sequential tests apply.
The numerical stability of the methods is investigated in two additional application studies.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Hypothesis testing of random forest (RF) variable importance measures (VIMP)
remains the subject of ongoing research. Among recent developments, heuristic
approaches to parametric testing have been proposed whose distributional
assumptions are based on empirical evidence. Other formal tests under
regularity conditions were derived analytically. However, these approaches can
be computationally expensive or even practically infeasible. This problem also
occurs with non-parametric permutation tests, which are, however,
distribution-free and can generically be applied to any type of RF and VIMP.
Embracing this advantage, it is proposed here to use sequential permutation
tests and sequential p-value estimation to reduce the high computational costs
associated with conventional permutation tests. The popular and widely used
permutation VIMP serves as a practical and relevant application example. The
results of simulation studies confirm that the theoretical properties of the
sequential tests apply, that is, the type-I error probability is controlled at
a nominal level and a high power is maintained with considerably fewer
permutations needed in comparison to conventional permutation testing. The
numerical stability of the methods is investigated in two additional
application studies. In summary, theoretically sound sequential permutation
testing of VIMP is possible at greatly reduced computational costs.
Recommendations for application are given. A respective implementation is
provided through the accompanying R package $rfvimptest$. The approach can also
be easily applied to any kind of prediction model.
Related papers
- Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs.
We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee.
This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z) - Max-Rank: Efficient Multiple Testing for Conformal Prediction [43.56898111853698]
Multiple hypothesis testing (MHT) commonly arises in various scientific fields, from genomics to psychology, where testing many hypotheses simultaneously increases the risk of Type-I errors.
We propose a novel correction named $textttmax-rank$ that leverages these dependencies, whilst ensuring that the joint Type-I error rate is efficiently controlled.
arXiv Detail & Related papers (2023-11-17T22:44:22Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences
with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes.
We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Selective Probabilistic Classifier Based on Hypothesis Testing [14.695979686066066]
We propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier.
The proposed method is a rejection option based on hypothesis testing with probabilistic networks.
It is shown that the proposed method can achieve a broader range of operation and cover a lower False Positive Ratio than the alternative.
arXiv Detail & Related papers (2021-05-09T08:55:56Z) - Asymptotic Validity and Finite-Sample Properties of Approximate Randomization Tests [2.28438857884398]
Our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data.
We illustrate our theory through several examples, including tests of significance in linear regression.
arXiv Detail & Related papers (2019-08-12T16:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.