Related papers: Contextual Feature Selection with Conditional Stochastic Gates

Contextual Feature Selection with Conditional Stochastic Gates

URL: http://arxiv.org/abs/2312.14254v2
Date: Fri, 7 Jun 2024 19:01:08 GMT
Title: Contextual Feature Selection with Conditional Stochastic Gates
Authors: Ram Dyuthi Sristi, Ofir Lindenbaum, Shira Lifshitz, Maria Lavzin, Jackie Schiller, Gal Mishne, Hadas Benisty,
Abstract summary: Conditional Gates (c-STG) models the importance of features using conditional variables whose parameters are predicted based on contextual variables. We show that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.
Score: 9.784482648233048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.

Related papers

Variational Garrote for Statistical Physics-based Sparse and Robust Variable Selection [8.312621461460148]
We revisit the statistical physics-based Variational Garrote (VG) method, which introduces explicit feature selection spin variables.<n>We evaluate VG on both fully controllable synthetic datasets and complex real-world datasets.<n> VG offers strong potential for sparse modeling across a wide range of applications, including compressed sensing and model pruning in machine learning.
arXiv Detail & Related papers (2025-09-08T07:06:10Z)
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection [27.563529091471935]
This work introduces a novel approach namely Knockoff with over- parameterization (Knoop) to enhance variable selection. Knoop generates multiple knockoff variables for each original variable and integrates them with the original variables into a Ridgeless regression model. Experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets.
arXiv Detail & Related papers (2025-01-28T09:27:04Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
Context-aware feature attribution through argumentation [0.0]
We define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA) Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction.
arXiv Detail & Related papers (2023-10-24T20:02:02Z)
Scalable variable selection for two-view learning tasks with projection operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders [0.0]
We argue that continuous variables may not be ideal to model features of textual data, due to the fact that most generative factors in text are discrete. We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations.
arXiv Detail & Related papers (2021-09-15T09:10:05Z)
Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z)
Top-$k$ Regularization for Supervised Feature Selection [11.927046591097623]
We introduce a novel, simple yet effective regularization approach, named top-$k$ regularization, to supervised feature selection. We show that the top-$k$ regularization is effective and stable for supervised feature selection.
arXiv Detail & Related papers (2021-06-04T01:12:47Z)
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters. We infer the posteriors over such latent variables based on data from seen task-language combinations. Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.