Neural Methods for Point-wise Dependency Estimation
- URL: http://arxiv.org/abs/2006.05553v4
- Date: Thu, 15 Oct 2020 03:55:10 GMT
- Title: Neural Methods for Point-wise Dependency Estimation
- Authors: Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency,
Ruslan Salakhutdinov
- Abstract summary: We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
- Score: 129.93860669802046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since its inception, the neural estimation of mutual information (MI) has
demonstrated the empirical success of modeling expected dependency between
high-dimensional random variables. However, MI is an aggregate statistic and
cannot be used to measure point-wise dependency between different events. In
this work, instead of estimating the expected dependency, we focus on
estimating point-wise dependency (PD), which quantitatively measures how likely
two outcomes co-occur. We show that we can naturally obtain PD when we are
optimizing MI neural variational bounds. However, optimizing these bounds is
challenging due to its large variance in practice. To address this issue, we
develop two methods (free of optimizing MI variational bounds): Probabilistic
Classifier and Density-Ratio Fitting. We demonstrate the effectiveness of our
approaches in 1) MI estimation, 2) self-supervised representation learning, and
3) cross-modal retrieval task.
Related papers
- Quantifying Emergence in Large Language Models [31.608080868988825]
We propose a quantifiable solution for estimating emergence of LLMs.
Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level.
Our method demonstrates consistent behaviors across a suite of LMs under both in-context learning (ICL) and natural sentences.
arXiv Detail & Related papers (2024-05-21T09:12:20Z) - MINDE: Mutual Information Neural Diffusion Estimation [7.399561232927219]
We present a new method for the estimation of Mutual Information (MI) between random variables.
We use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions.
As a by-product, our method also enables the estimation of the entropy of random variables.
arXiv Detail & Related papers (2023-10-13T11:47:41Z) - Max-Sliced Mutual Information [17.667315953598788]
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference.
Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure.
This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI)
arXiv Detail & Related papers (2023-09-28T06:49:25Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS [0.0]
Mutual information (MI) is an information-theoretic measure of dependency between two random variables.
Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios.
We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
arXiv Detail & Related papers (2020-11-17T14:32:48Z) - DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z) - Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades.
Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
arXiv Detail & Related papers (2020-07-01T18:25:10Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.