Related papers: Learning to Maximize Mutual Information for Dynamic Feature Selection

Learning to Maximize Mutual Information for Dynamic Feature Selection

URL: http://arxiv.org/abs/2301.00557v2
Date: Thu, 8 Jun 2023 07:32:18 GMT
Title: Learning to Maximize Mutual Information for Dynamic Feature Selection
Authors: Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee
Abstract summary: We consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. We explore a simpler approach of greedily selecting features based on their conditional mutual information. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments.
Score: 13.821253491768168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.

Related papers

Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations [0.0]
There is no clearly established methodology for effective training data selection.<n>Model Internal Representations (KAMIR) is a novel approach that overcomes these limitations.<n>It can be applied to a wide range of tasks such as machine reading comprehension and summarization.
arXiv Detail & Related papers (2025-09-09T01:08:15Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation [77.07879255360342]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information.<n>In RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers.<n>Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds.<n>Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality.
arXiv Detail & Related papers (2025-07-25T09:32:29Z)
TayFCS: Towards Light Feature Combination Selection for Deep Recommender Systems [44.80081613834248]
Taylor Expansion Scorer (TayScorer) module for field-wise Taylor expansion on the base model.<n> Logistic Regression Elimination (LRE) estimates the corresponding information gain based on the model prediction performance.
arXiv Detail & Related papers (2025-07-05T04:22:42Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Estimating Conditional Mutual Information for Dynamic Feature Selection [14.706269510726356]
Dynamic feature selection is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. Our method provides consistent gains over recent methods across a variety of datasets.
arXiv Detail & Related papers (2023-06-05T23:03:03Z)
MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training. Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z)
Greedy Modality Selection via Approximate Submodular Maximization [19.22947539760366]
Multimodal learning considers learning from multi-modality data, aiming to fuse heterogeneous sources of information. It is not always feasible to leverage all available modalities due to memory constraints. We study modality selection, intending to efficiently select the most informative and complementary modalities under certain computational constraints.
arXiv Detail & Related papers (2022-10-22T22:07:27Z)
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning [16.289623977712086]
Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. We propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning.
arXiv Detail & Related papers (2022-04-04T13:16:46Z)
Practical Active Learning with Model Selection for Small Data [13.128648437690224]
We develop a simple and fast method for practical active learning with model selection. Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel.
arXiv Detail & Related papers (2021-12-21T23:11:27Z)
Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning. The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z)
Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method. A subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z)
S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation. For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.