Feature Selection using e-values
- URL: http://arxiv.org/abs/2206.05391v1
- Date: Sat, 11 Jun 2022 01:34:29 GMT
- Title: Feature Selection using e-values
- Authors: Subhabrata Majumdar, Snigdhansu Chatterjee
- Abstract summary: We introduce the concept of e-values in the context of supervised parametric models.
Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not.
We use data depths and a fast re-sampling-based algorithm to implement a feature selection procedure using e-values.
- Score: 4.3512163406552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of supervised parametric models, we introduce the concept of
e-values. An e-value is a scalar quantity that represents the proximity of the
sampling distribution of parameter estimates in a model trained on a subset of
features to that of the model trained on all features (i.e. the full model).
Under general conditions, a rank ordering of e-values separates models that
contain all essential features from those that do not.
The e-values are applicable to a wide range of parametric models. We use data
depths and a fast resampling-based algorithm to implement a feature selection
procedure using e-values, providing consistency results. For a $p$-dimensional
feature space, this procedure requires fitting only the full model and
evaluating $p+1$ models, as opposed to the traditional requirement of fitting
and evaluating $2^p$ models. Through experiments across several model settings
and synthetic and real datasets, we establish that the e-values method as a
promising general alternative to existing model-specific methods of feature
selection.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP Values [17.489279048199304]
REFRESH is a method to reselect features so that additional constraints that are desirable towards model performance can be achieved without having to train several new models.
REFRESH's underlying algorithm is a novel technique using SHAP values and correlation analysis that can approximate for the predictions of a model without having to train these models.
arXiv Detail & Related papers (2024-03-13T18:06:43Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Binary Feature Mask Optimization for Feature Selection [0.0]
We introduce a novel framework that selects features considering the predictions of the model.
Our framework innovates by using a novel feature masking approach to eliminate the features during the selection process.
We demonstrate significant performance improvements on the real-life datasets using LightGBM and Multi-Layer Perceptron as our ML models.
arXiv Detail & Related papers (2024-01-23T10:54:13Z) - The Interpolating Information Criterion for Overparameterized Models [49.283527214211446]
We show that the Interpolating Information Criterion is a measure of model quality that naturally incorporates the choice of prior into the model selection.
Our new information criterion accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior.
arXiv Detail & Related papers (2023-07-15T12:09:54Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - PAMI: partition input and aggregate outputs for model interpretation [69.42924964776766]
In this study, a simple yet effective visualization framework called PAMI is proposed based on the observation that deep learning models often aggregate features from local regions for model predictions.
The basic idea is to mask majority of the input and use the corresponding model output as the relative contribution of the preserved input part to the original model prediction.
Extensive experiments on multiple tasks confirm the proposed method performs better than existing visualization approaches in more precisely finding class-specific input regions.
arXiv Detail & Related papers (2023-02-07T08:48:34Z) - Optimizing model-agnostic Random Subspace ensembles [5.680512932725364]
We present a model-agnostic ensemble approach for supervised learning.
The proposed approach alternates between learning an ensemble of models using a parametric version of the Random Subspace approach.
We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets.
arXiv Detail & Related papers (2021-09-07T13:58:23Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Gaussian Function On Response Surface Estimation [12.35564140065216]
We propose a new framework for interpreting (features and samples) black-box machine learning models via a metamodeling technique.
The metamodel can be estimated from data generated via a trained complex model by running the computer experiment on samples of data in the region of interest.
arXiv Detail & Related papers (2021-01-04T04:47:00Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.