TCMI: a non-parametric mutual-dependence estimator for multivariate
continuous distributions
- URL: http://arxiv.org/abs/2001.11212v3
- Date: Sat, 30 Jul 2022 09:07:56 GMT
- Title: TCMI: a non-parametric mutual-dependence estimator for multivariate
continuous distributions
- Authors: Benjamin Regler, Matthias Scheffler, Luca M. Ghiringhelli
- Abstract summary: Total cumulative mutual information (TCMI) is a measure of the relevance of mutual dependences.
TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The identification of relevant features, i.e., the driving variables that
determine a process or the properties of a system, is an essential part of the
analysis of data sets with a large number of variables. A mathematical rigorous
approach to quantifying the relevance of these features is mutual information.
Mutual information determines the relevance of features in terms of their joint
mutual dependence to the property of interest. However, mutual information
requires as input probability distributions, which cannot be reliably estimated
from continuous distributions such as physical quantities like lengths or
energies. Here, we introduce total cumulative mutual information (TCMI), a
measure of the relevance of mutual dependences that extends mutual information
to random variables of continuous distribution based on cumulative probability
distributions. TCMI is a non-parametric, robust, and deterministic measure that
facilitates comparisons and rankings between feature sets with different
cardinality. The ranking induced by TCMI allows for feature selection, i.e.,
the identification of variable sets that are nonlinear statistically related to
a property of interest, taking into account the number of data samples as well
as the cardinality of the set of variables. We evaluate the performance of our
measure with simulated data, compare its performance with similar
multivariate-dependence measures, and demonstrate the effectiveness of our
feature-selection method on a set of standard data sets and a typical scenario
in materials science.
Related papers
- Normalization in Proportional Feature Spaces [49.48516314472825]
normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling.
The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features.
arXiv Detail & Related papers (2024-09-17T17:46:27Z) - Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data [1.0650780147044159]
We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources.
Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution.
We extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution.
arXiv Detail & Related papers (2024-09-16T04:10:44Z) - On the Properties and Estimation of Pointwise Mutual Information Profiles [49.877314063833296]
The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables.
We introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods.
arXiv Detail & Related papers (2023-10-16T10:02:24Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Opening the random forest black box by the analysis of the mutual impact
of features [0.0]
We propose two novel approaches that focus on the mutual impact of features in random forests.
MFI and MIR are very promising to shed light on the complex relationships between features and outcome.
arXiv Detail & Related papers (2023-04-05T15:03:46Z) - Diffeomorphic Information Neural Estimation [2.566492438263125]
Mutual Information (MI) and Conditional Mutual Information (CMI) are multi-purpose tools from information theory.
We introduce DINE (Diffeomorphic Information Neural Estorimator)-a novel approach for estimating CMI of continuous random variables.
We show that the variables of interest can be replaced with appropriate surrogates that follow simpler distributions.
arXiv Detail & Related papers (2022-11-20T03:03:56Z) - Measuring Statistical Dependencies via Maximum Norm and Characteristic
Functions [0.0]
We propose a statistical dependence measure based on the maximum-norm of the difference between joint and product-marginal characteristic functions.
The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions.
We conduct experiments both with simulated and real data.
arXiv Detail & Related papers (2022-08-16T20:24:31Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Disentanglement Analysis with Partial Information Decomposition [31.56299813238937]
disentangled representations aim at reversing the process by mapping data to multiple random variables that individually capture distinct generative factors.
Current disentanglement metrics are designed to measure the concentration, e.g., absolute deviation, variance, or entropy, of each variable conditioned by each generative factor.
In this work, we use the Partial Information Decomposition framework to evaluate information sharing between more than two variables, and build a framework, including a new disentanglement metric.
arXiv Detail & Related papers (2021-08-31T11:09:40Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.