Measuring the Discrepancy between Conditional Distributions: Methods,
Properties and Applications
- URL: http://arxiv.org/abs/2005.02196v2
- Date: Tue, 29 Dec 2020 00:32:02 GMT
- Title: Measuring the Discrepancy between Conditional Distributions: Methods,
Properties and Applications
- Authors: Shujian Yu, Ammar Shaker, Francesco Alesiani, Jose C. Principe
- Abstract summary: We propose a simple yet powerful test statistic to quantify the discrepancy between two conditional distributions.
The new statistic avoids the explicit estimation of the underlying distributions in highdimensional space.
It inherits the merits of the correntropy function to explicitly incorporate high-order statistics in the data.
- Score: 18.293397644865458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simple yet powerful test statistic to quantify the discrepancy
between two conditional distributions. The new statistic avoids the explicit
estimation of the underlying distributions in highdimensional space and it
operates on the cone of symmetric positive semidefinite (SPS) matrix using the
Bregman matrix divergence. Moreover, it inherits the merits of the correntropy
function to explicitly incorporate high-order statistics in the data. We
present the properties of our new statistic and illustrate its connections to
prior art. We finally show the applications of our new statistic on three
different machine learning problems, namely the multi-task learning over
graphs, the concept drift detection, and the information-theoretic feature
selection, to demonstrate its utility and advantage. Code of our statistic is
available at https://bit.ly/BregmanCorrentropy.
Related papers
- Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Measuring Statistical Dependencies via Maximum Norm and Characteristic
Functions [0.0]
We propose a statistical dependence measure based on the maximum-norm of the difference between joint and product-marginal characteristic functions.
The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions.
We conduct experiments both with simulated and real data.
arXiv Detail & Related papers (2022-08-16T20:24:31Z) - Information Processing Equalities and the Information-Risk Bridge [10.451984251615512]
We introduce two new classes of measures of information for statistical experiments.
We derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem.
arXiv Detail & Related papers (2022-07-25T08:54:36Z) - A Unifying Framework for Some Directed Distances in Statistics [0.0]
Density-based directed distances -- particularly known as divergences -- are widely used in statistics.
We provide a general framework which covers in particular both the density-based and distribution-function-based divergence approaches.
We deduce new concepts of dependence between random variables, as alternatives to the celebrated mutual information.
arXiv Detail & Related papers (2022-03-02T04:24:13Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Three rates of convergence or separation via U-statistics in a dependent
framework [5.929956715430167]
We put this theoretical breakthrough into action by pushing further the current state of knowledge in three different active fields of research.
First, we establish a new exponential inequality for the estimation of spectra of trace class integral operators with MCMC methods.
In addition, we investigate generalization performance of online algorithms working with pairwise loss functions and Markov chain samples.
arXiv Detail & Related papers (2021-06-24T07:10:36Z) - Learning Log-Determinant Divergences for Positive Definite Matrices [47.61701711840848]
In this paper, we propose to learn similarity measures in a data-driven manner.
We capitalize on the alphabeta-log-det divergence, which is a meta-divergence parametrized by scalars alpha and beta.
Our key idea is to cast these parameters in a continuum and learn them from data.
arXiv Detail & Related papers (2021-04-13T19:09:43Z) - Causal learning with sufficient statistics: an information bottleneck
approach [3.720546514089338]
Methods extracting causal information from conditional independencies between variables of a system are common.
We capitalize on the fact that the laws governing the generative mechanisms of a system often result in substructures embodied in the generative functional equation of a variable.
We propose to use the Information Bottleneck method, a technique commonly applied for dimensionality reduction, to find underlying sufficient sets of statistics.
arXiv Detail & Related papers (2020-10-12T00:20:01Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.