Measuring Dependence with Matrix-based Entropy Functional
- URL: http://arxiv.org/abs/2101.10160v1
- Date: Mon, 25 Jan 2021 15:18:16 GMT
- Title: Measuring Dependence with Matrix-based Entropy Functional
- Authors: Shujian Yu, Francesco Alesiani, Xi Yu, Robert Jenssen, Jose C.
Principe
- Abstract summary: Measuring the dependence of data plays a central role in statistics and machine learning.
We propose two measures, namely the matrix-based normalized total correlation ($T_alpha*$) and the matrix-based normalized dual total correlation ($D_alpha*$)
We show that our measures are differentiable and statistically more powerful than prevalent ones.
- Score: 21.713076360132195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring the dependence of data plays a central role in statistics and
machine learning. In this work, we summarize and generalize the main idea of
existing information-theoretic dependence measures into a higher-level
perspective by the Shearer's inequality. Based on our generalization, we then
propose two measures, namely the matrix-based normalized total correlation
($T_\alpha^*$) and the matrix-based normalized dual total correlation
($D_\alpha^*$), to quantify the dependence of multiple variables in arbitrary
dimensional space, without explicit estimation of the underlying data
distributions. We show that our measures are differentiable and statistically
more powerful than prevalent ones. We also show the impact of our measures in
four different machine learning problems, namely the gene regulatory network
inference, the robust machine learning under covariate shift and non-Gaussian
noises, the subspace outlier detection, and the understanding of the learning
dynamics of convolutional neural networks (CNNs), to demonstrate their
utilities, advantages, as well as implications to those problems. Code of our
dependence measure is available at: https://bit.ly/AAAI-dependence
Related papers
- Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra [0.0]
Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time.
We employ techniques from statistical mechanics to analyze one-pass gradient descent within a student-teacher framework.
arXiv Detail & Related papers (2024-10-11T17:21:42Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Max-Sliced Mutual Information [17.667315953598788]
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference.
Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure.
This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI)
arXiv Detail & Related papers (2023-09-28T06:49:25Z) - iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive
Noise Models [48.33685559041322]
This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables.
Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.
arXiv Detail & Related papers (2023-06-30T01:48:11Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Measuring Statistical Dependencies via Maximum Norm and Characteristic
Functions [0.0]
We propose a statistical dependence measure based on the maximum-norm of the difference between joint and product-marginal characteristic functions.
The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions.
We conduct experiments both with simulated and real data.
arXiv Detail & Related papers (2022-08-16T20:24:31Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z) - Causal learning with sufficient statistics: an information bottleneck
approach [3.720546514089338]
Methods extracting causal information from conditional independencies between variables of a system are common.
We capitalize on the fact that the laws governing the generative mechanisms of a system often result in substructures embodied in the generative functional equation of a variable.
We propose to use the Information Bottleneck method, a technique commonly applied for dimensionality reduction, to find underlying sufficient sets of statistics.
arXiv Detail & Related papers (2020-10-12T00:20:01Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.