Integrated path stability selection
- URL: http://arxiv.org/abs/2403.15877v2
- Date: Tue, 27 Aug 2024 01:14:57 GMT
- Title: Integrated path stability selection
- Authors: Omar Melikechi, Jeffrey W. Miller,
- Abstract summary: We introduce a novel approach to stability selection based on integrating stability paths rather than maximizing over them.
This yields upper bounds on E(FP) that are orders of magnitude stronger than previous bounds, leading to significantly more true positives in practice for the same target E(FP)
We demonstrate the method on simulations and real data from prostate and colon cancer studies.
- Score: 5.263910852465186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stability selection is a popular method for improving feature selection algorithms. One of its key attributes is that it provides theoretical upper bounds on the expected number of false positives, E(FP), enabling control of false positives in practice. However, stability selection often selects very few features, resulting in low sensitivity. This is because existing bounds on E(FP) are relatively loose, causing stability selection to overestimate the number of false positives. In this paper, we introduce a novel approach to stability selection based on integrating stability paths rather than maximizing over them. This yields upper bounds on E(FP) that are orders of magnitude stronger than previous bounds, leading to significantly more true positives in practice for the same target E(FP). Furthermore, our method takes the same amount of computation as the original stability selection algorithm, and only requires one user-specified parameter, which can be either the target E(FP) or target false discovery rate. We demonstrate the method on simulations and real data from prostate and colon cancer studies.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Fast nonparametric feature selection with error control using integrated path stability selection [12.608885112539202]
We introduce a general feature selection method that applies integrated path stability selection to thresholding to control false positives and the false discovery rate.
We focus on two special cases of the general method based on gradient boosting (IPSSGB) and random forests (IPSSRF)
Extensive simulations with RNA sequencing data show that IPSSGB and IPSSRF have better error control, detect more true positives, and are faster than existing methods.
arXiv Detail & Related papers (2024-10-03T04:42:28Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method.
The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets.
For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z) - Loss-guided Stability Selection [0.0]
It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data.
Standard Stability Selection is based on a global criterion, namely the per-family error rate.
We propose a Stability Selection variant which respects the chosen loss function via an additional validation step.
arXiv Detail & Related papers (2022-02-10T11:20:25Z) - Sparse Bayesian Learning via Stepwise Regression [1.2691047660244335]
We propose a coordinate ascent algorithm for SBL termed Relevance Matching Pursuit (RMP)
As its noise variance parameter goes to zero, RMP exhibits a surprising connection to Stepwise Regression.
We derive novel guarantees for Stepwise Regression algorithms, which also shed light on RMP.
arXiv Detail & Related papers (2021-06-11T00:20:27Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.