Adjusted Measures for Feature Selection Stability for Data Sets with
Similar Features
- URL: http://arxiv.org/abs/2009.12075v1
- Date: Fri, 25 Sep 2020 07:52:19 GMT
- Title: Adjusted Measures for Feature Selection Stability for Data Sets with
Similar Features
- Authors: Andrea Bommert and J\"org Rahnenf\"uhrer
- Abstract summary: We introduce new adjusted stability measures that overcome the drawbacks of existing measures.
Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For data sets with similar features, for example highly correlated features,
most existing stability measures behave in an undesired way: They consider
features that are almost identical but have different identifiers as different
features. Existing adjusted stability measures, that is, stability measures
that take into account the similarities between features, have major
theoretical drawbacks. We introduce new adjusted stability measures that
overcome these drawbacks. We compare them to each other and to existing
stability measures based on both artificial and real sets of selected features.
Based on the results, we suggest using one new stability measure that considers
highly similar features as exchangeable.
Related papers
- Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density [93.32594873253534]
Trustworthy machine learning requires meticulous regulation of model reliance on non-robust features.
We propose a framework to delineate and regulate such features by attributing model predictions to the input.
arXiv Detail & Related papers (2024-07-05T09:16:56Z) - Towards Stable 3D Object Detection [64.49059005467817]
Stability Index (SI) is a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading.
To help models improve their stability, we introduce a general and effective training strategy, called Prediction Consistency Learning (PCL)
PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability.
arXiv Detail & Related papers (2024-07-05T07:17:58Z) - Stable Update of Regression Trees [0.0]
We focus on the stability of an inherently explainable machine learning method, namely regression trees.
We propose a regularization method, where data points are weighted based on the uncertainty in the initial model.
Results show that the proposed update method improves stability while achieving similar or better predictive performance.
arXiv Detail & Related papers (2024-02-21T09:41:56Z) - An information theoretic approach to quantify the stability of feature
selection and ranking algorithms [0.0]
We propose an information theoretic approach based on the Jensen Shannon divergence to quantify this robustness.
Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists.
We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearmans rank correlation and the Kunchevas index on feature ranking and selection outcomes, respectively.
arXiv Detail & Related papers (2024-02-07T22:17:37Z) - Model Merging by Uncertainty-Based Gradient Matching [70.54580972266096]
We propose a new uncertainty-based scheme to improve the performance by reducing the mismatch.
Our new method gives consistent improvements for large language models and vision transformers.
arXiv Detail & Related papers (2023-10-19T15:02:45Z) - Measuring the Instability of Fine-Tuning [7.370822347217826]
Fine-tuning pre-trained language models on downstream tasks with varying random seeds has been shown to be unstable.
In this paper, we analyze SD and six other measures quantifying instability at different levels of granularity.
arXiv Detail & Related papers (2023-02-15T16:55:15Z) - Numerically Stable Sparse Gaussian Processes via Minimum Separation
using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points.
For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z) - PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures [65.36234499099294]
We propose a new data augmentation strategy utilizing the natural structural complexity of pictures such as fractals.
To meet this challenge, we design a new data augmentation strategy utilizing the natural structural complexity of pictures such as fractals.
arXiv Detail & Related papers (2021-12-09T18:59:31Z) - Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting
on Data Sets with Similar Features [0.1127980896956825]
We show that our approach achieves the same or better predictive performance compared to the two established approaches.
Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features.
For data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure.
arXiv Detail & Related papers (2021-06-15T12:48:07Z) - Removing Spurious Features can Hurt Accuracy and Affect Groups
Disproportionately [83.68135652247496]
A natural remedy is to remove spurious features from the model.
We show that removal of spurious features can decrease accuracy due to inductive biases.
We also show that robust self-training can remove spurious features without affecting the overall accuracy.
arXiv Detail & Related papers (2020-12-07T23:08:59Z) - Characterizing the Stability of NISQ Devices [0.40611352512781856]
We develop the metrics and theoretical framework to quantify the DiVincenzo requirements and study the stability of those key metrics.
For identical experiments, devices which produce reproducible histograms in time, and similar histograms in space, are considered more reliable.
We illustrate our methodology using data collected from IBM's Yorktown device.
arXiv Detail & Related papers (2020-08-21T15:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.