Related papers: Estimating Conditional Mutual Information for Dynamic Feature Selection

Estimating Conditional Mutual Information for Dynamic Feature Selection

URL: http://arxiv.org/abs/2306.03301v3
Date: Sun, 8 Sep 2024 17:44:14 GMT
Title: Estimating Conditional Mutual Information for Dynamic Feature Selection
Authors: Soham Gadgil, Ian Covert, Su-In Lee,
Abstract summary: Dynamic feature selection is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. Our method provides consistent gains over recent methods across a variety of datasets.
Score: 14.706269510726356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions. The problem is challenging, however, as it requires both predicting with arbitrary feature sets and learning a policy to identify valuable selections. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. The main challenge is implementing this policy, and we design a new approach that estimates the mutual information in a discriminative rather than generative fashion. Building on our approach, we then introduce several further improvements: allowing variable feature budgets across samples, enabling non-uniform feature costs, incorporating prior information, and exploring modern architectures to handle partial inputs. Our experiments show that our method provides consistent gains over recent methods across a variety of datasets.

Related papers

Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z)
OPO: Making Decision-Focused Data Acquisition Decisions [0.0]
We propose a model for making data acquisition decisions for variables in contextual optimisation problems. We solve the data acquisition problem with well-defined constraints by learning a surrogate linear objective function. We ablate the problem with a number of training modalities and demonstrate that the differentiable optimisation approach outperforms random search strategies.
arXiv Detail & Related papers (2025-04-21T12:41:35Z)
Deep Generative Demand Learning for Newsvendor and Pricing [7.594251468240168]
We consider data-driven inventory and pricing decisions in the feature-based newsvendor problem. We propose a novel approach leveraging conditional deep generative models (cDGMs) to address these challenges. We provide theoretical guarantees for our approach, including the consistency of profit estimation and convergence of our decisions to the optimal solution.
arXiv Detail & Related papers (2024-11-13T14:17:26Z)
Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective. We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition. Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Enhancing Neural Subset Selection: Integrating Background Information into Set Representations [53.15923939406772]
We show that when the target value is conditioned on both the input set and subset, it is essential to incorporate an textitinvariant sufficient statistic of the superset into the subset of interest. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated.
arXiv Detail & Related papers (2024-02-05T16:09:35Z)
Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Learning to Maximize Mutual Information for Dynamic Feature Selection [13.821253491768168]
We consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. We explore a simpler approach of greedily selecting features based on their conditional mutual information. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments.
arXiv Detail & Related papers (2023-01-02T08:31:56Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
VFDS: Variational Foresight Dynamic Selection in Bayesian Neural Networks for Efficient Human Activity Recognition [81.29900407096977]
Variational Foresight Dynamic Selection (VFDS) learns a policy that selects the next feature subset to observe. We apply VFDS on the Human Activity Recognition (HAR) task where the performance-cost trade-off is critical in its practice.
arXiv Detail & Related papers (2022-03-31T22:52:43Z)
Model-agnostic and Scalable Counterfactual Explanations via Reinforcement Learning [0.5729426778193398]
We propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process. Our experiments on real-world data show that our method is model-agnostic, relying only on feedback from model predictions.
arXiv Detail & Related papers (2021-06-04T16:54:36Z)
Feature Selection for Huge Data via Minipatch Learning [0.0]
We propose Stable Minipatch Selection (STAMPS) and Adaptive STAMPS. STAMPS are meta-algorithms that build ensembles of selection events of base feature selectors trained on tiny, (ly-adaptive) random subsets of both the observations and features of the data. Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques.
arXiv Detail & Related papers (2020-10-16T17:41:08Z)
Dynamic Feature Acquisition with Arbitrary Conditional Flows [11.655069211977464]
We propose models that dynamically acquire new features to further improve the prediction assessment. We leverage an information theoretic metric, conditional mutual information, to select the most informative feature to acquire. Our model demonstrates superior performance over baselines evaluated in multiple settings.
arXiv Detail & Related papers (2020-06-13T19:01:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.