A Rigorous Information-Theoretic Definition of Redundancy and Relevancy
in Feature Selection Based on (Partial) Information Decomposition
- URL: http://arxiv.org/abs/2105.04187v4
- Date: Thu, 4 May 2023 08:49:48 GMT
- Title: A Rigorous Information-Theoretic Definition of Redundancy and Relevancy
in Feature Selection Based on (Partial) Information Decomposition
- Authors: Patricia Wollstadt and Sebastian Schmitt and Michael Wibral
- Abstract summary: We argue that information theory does not provide measures to decompose the information a set of variables provides about a target into unique, redundant, and synergistic contributions.
Using partial information decomposition (PID) we provide a novel definition of feature relevancy and redundancy in PID terms.
We propose an iterative, CMI-based algorithm for practical feature selection.
- Score: 0.0483420384410068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting a minimal feature set that is maximally informative about a target
variable is a central task in machine learning and statistics. Information
theory provides a powerful framework for formulating feature selection
algorithms -- yet, a rigorous, information-theoretic definition of feature
relevancy, which accounts for feature interactions such as redundant and
synergistic contributions, is still missing. We argue that this lack is
inherent to classical information theory which does not provide measures to
decompose the information a set of variables provides about a target into
unique, redundant, and synergistic contributions. Such a decomposition has been
introduced only recently by the partial information decomposition (PID)
framework. Using PID, we clarify why feature selection is a conceptually
difficult problem when approached using information theory and provide a novel
definition of feature relevancy and redundancy in PID terms. From this
definition, we show that the conditional mutual information (CMI) maximizes
relevancy while minimizing redundancy and propose an iterative, CMI-based
algorithm for practical feature selection. We demonstrate the power of our
CMI-based algorithm in comparison to the unconditional mutual information on
benchmark examples and provide corresponding PID estimates to highlight how PID
allows to quantify information contribution of features and their interactions
in feature-selection problems.
Related papers
- Partial Information Decomposition for Data Interpretability and Feature Selection [3.7414804164475983]
Partial Information Decomposition of Features (PIDF) is a new paradigm for simultaneous data interpretability and feature selection.
We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness.
arXiv Detail & Related papers (2024-05-29T15:54:03Z) - Differentiable Information Bottleneck for Deterministic Multi-view Clustering [9.723389925212567]
We propose a new differentiable information bottleneck (DIB) method, which provides a deterministic and analytical MVC solution.
Specifically, we first propose to directly fit the mutual information of high-dimensional spaces by leveraging normalized kernel Gram matrix.
Then, based on the new mutual information measurement, a deterministic multi-view neural network with analytical gradients is explicitly trained to parameterize IB principle.
arXiv Detail & Related papers (2024-03-23T02:13:22Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - Interpretability with full complexity by constraining feature
information [1.52292571922932]
Interpretability is a pressing issue for machine learning.
We approach interpretability from a new angle: constrain the information about the features without restricting the complexity of the model.
We develop a framework for extracting insight from the spectrum of approximate models.
arXiv Detail & Related papers (2022-11-30T18:59:01Z) - Unsupervised Features Ranking via Coalitional Game Theory for
Categorical Data [0.28675177318965034]
Unsupervised feature selection aims to reduce the number of features.
We show that the deriving features' selection outperforms competing methods in lowering the redundancy rate.
arXiv Detail & Related papers (2022-05-17T14:17:36Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.