Learning to Maximize Mutual Information for Dynamic Feature Selection
- URL: http://arxiv.org/abs/2301.00557v2
- Date: Thu, 8 Jun 2023 07:32:18 GMT
- Title: Learning to Maximize Mutual Information for Dynamic Feature Selection
- Authors: Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee
- Abstract summary: We consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information.
We explore a simpler approach of greedily selecting features based on their conditional mutual information.
The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments.
- Score: 13.821253491768168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection helps reduce data acquisition costs in ML, but the standard
approach is to train models with static feature subsets. Here, we consider the
dynamic feature selection (DFS) problem where a model sequentially queries
features based on the presently available information. DFS is often addressed
with reinforcement learning, but we explore a simpler approach of greedily
selecting features based on their conditional mutual information. This method
is theoretically appealing but requires oracle access to the data distribution,
so we develop a learning approach based on amortized optimization. The proposed
method is shown to recover the greedy policy when trained to optimality, and it
outperforms numerous existing feature selection methods in our experiments,
thus validating it as a simple but powerful approach for this problem.
Related papers
- LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Estimating Conditional Mutual Information for Dynamic Feature Selection [14.706269510726356]
Dynamic feature selection is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions.
Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable.
Our method provides consistent gains over recent methods across a variety of datasets.
arXiv Detail & Related papers (2023-06-05T23:03:03Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Greedy Modality Selection via Approximate Submodular Maximization [19.22947539760366]
Multimodal learning considers learning from multi-modality data, aiming to fuse heterogeneous sources of information.
It is not always feasible to leverage all available modalities due to memory constraints.
We study modality selection, intending to efficiently select the most informative and complementary modalities under certain computational constraints.
arXiv Detail & Related papers (2022-10-22T22:07:27Z) - SHiFT: An Efficient, Flexible Search Engine for Transfer Learning [16.289623977712086]
Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch.
We propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning.
arXiv Detail & Related papers (2022-04-04T13:16:46Z) - Practical Active Learning with Model Selection for Small Data [13.128648437690224]
We develop a simple and fast method for practical active learning with model selection.
Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel.
arXiv Detail & Related papers (2021-12-21T23:11:27Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.