Cluster Stability Selection
- URL: http://arxiv.org/abs/2201.00494v1
- Date: Mon, 3 Jan 2022 06:28:17 GMT
- Title: Cluster Stability Selection
- Authors: Gregory Faletto, Jacob Bien
- Abstract summary: Stability selection makes any feature selection method more stable by returning only those features that are consistently selected across many subsamples.
We introduce cluster stability selection, which exploits the practitioner's knowledge that highly correlated clusters exist in the data.
In summary, cluster stability selection enjoys the best of both worlds, yielding a sparse selected set that is both stable and has good predictive performance.
- Score: 2.3986080077861787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stability selection (Meinshausen and Buhlmann, 2010) makes any feature
selection method more stable by returning only those features that are
consistently selected across many subsamples. We prove (in what is, to our
knowledge, the first result of its kind) that for data containing highly
correlated proxies for an important latent variable, the lasso typically
selects one proxy, yet stability selection with the lasso can fail to select
any proxy, leading to worse predictive performance than the lasso alone.
We introduce cluster stability selection, which exploits the practitioner's
knowledge that highly correlated clusters exist in the data, resulting in
better feature rankings than stability selection in this setting. We consider
several feature-combination approaches, including taking a weighted average of
the features in each important cluster where weights are determined by the
frequency with which cluster members are selected, which we show leads to
better predictive models than previous proposals.
We present generalizations of theoretical guarantees from Meinshausen and
Buhlmann (2010) and Shah and Samworth (2012) to show that cluster stability
selection retains the same guarantees. In summary, cluster stability selection
enjoys the best of both worlds, yielding a sparse selected set that is both
stable and has good predictive performance.
Related papers
- Stability and Multigroup Fairness in Ranking with Uncertain Predictions [61.76378420347408]
Our work considers ranking functions: maps from individual predictions for a classification task to distributions over rankings.
We focus on two aspects of ranking functions: stability to perturbations in predictions and fairness towards both individuals and subgroups.
Our work demonstrates that uncertainty aware rankings naturally interpolate between group and individual level fairness guarantees.
arXiv Detail & Related papers (2024-02-14T17:17:05Z) - An information theoretic approach to quantify the stability of feature
selection and ranking algorithms [0.0]
We propose an information theoretic approach based on the Jensen Shannon divergence to quantify this robustness.
Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists.
We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearmans rank correlation and the Kunchevas index on feature ranking and selection outcomes, respectively.
arXiv Detail & Related papers (2024-02-07T22:17:37Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Loss-guided Stability Selection [0.0]
It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data.
Standard Stability Selection is based on a global criterion, namely the per-family error rate.
We propose a Stability Selection variant which respects the chosen loss function via an additional validation step.
arXiv Detail & Related papers (2022-02-10T11:20:25Z) - Fast Estimation Method for the Stability of Ensemble Feature Selectors [8.984888893275714]
It is preferred that feature selectors be textitstable for better interpretabity and robust prediction.
We propose a simulator of a feature selector, and apply it to a fast estimation of the stability of ensemble feature selectors.
arXiv Detail & Related papers (2021-08-03T13:22:18Z) - Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting
on Data Sets with Similar Features [0.1127980896956825]
We show that our approach achieves the same or better predictive performance compared to the two established approaches.
Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features.
For data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure.
arXiv Detail & Related papers (2021-06-15T12:48:07Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z) - Selecting the Number of Clusters $K$ with a Stability Trade-off: an
Internal Validation Criterion [0.0]
Clustering stability has emerged as a natural and model-agnostic principle.
We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition.
arXiv Detail & Related papers (2020-06-15T16:38:48Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.