Neural Methods for Point-wise Dependency Estimation
- URL: http://arxiv.org/abs/2006.05553v4
- Date: Thu, 15 Oct 2020 03:55:10 GMT
- Title: Neural Methods for Point-wise Dependency Estimation
- Authors: Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency,
Ruslan Salakhutdinov
- Abstract summary: We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
- Score: 129.93860669802046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since its inception, the neural estimation of mutual information (MI) has
demonstrated the empirical success of modeling expected dependency between
high-dimensional random variables. However, MI is an aggregate statistic and
cannot be used to measure point-wise dependency between different events. In
this work, instead of estimating the expected dependency, we focus on
estimating point-wise dependency (PD), which quantitatively measures how likely
two outcomes co-occur. We show that we can naturally obtain PD when we are
optimizing MI neural variational bounds. However, optimizing these bounds is
challenging due to its large variance in practice. To address this issue, we
develop two methods (free of optimizing MI variational bounds): Probabilistic
Classifier and Density-Ratio Fitting. We demonstrate the effectiveness of our
approaches in 1) MI estimation, 2) self-supervised representation learning, and
3) cross-modal retrieval task.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - MINDE: Mutual Information Neural Diffusion Estimation [7.399561232927219]
We present a new method for the estimation of Mutual Information (MI) between random variables.
We use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions.
As a by-product, our method also enables the estimation of the entropy of random variables.
arXiv Detail & Related papers (2023-10-13T11:47:41Z) - Max-Sliced Mutual Information [17.667315953598788]
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference.
Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure.
This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI)
arXiv Detail & Related papers (2023-09-28T06:49:25Z) - Diffeomorphic Information Neural Estimation [2.566492438263125]
Mutual Information (MI) and Conditional Mutual Information (CMI) are multi-purpose tools from information theory.
We introduce DINE (Diffeomorphic Information Neural Estorimator)-a novel approach for estimating CMI of continuous random variables.
We show that the variables of interest can be replaced with appropriate surrogates that follow simpler distributions.
arXiv Detail & Related papers (2022-11-20T03:03:56Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS [0.0]
Mutual information (MI) is an information-theoretic measure of dependency between two random variables.
Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios.
We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
arXiv Detail & Related papers (2020-11-17T14:32:48Z) - DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.