Related papers: Neural Methods for Point-wise Dependency Estimation

Neural Methods for Point-wise Dependency Estimation

URL: http://arxiv.org/abs/2006.05553v4
Date: Thu, 15 Oct 2020 03:55:10 GMT
Title: Neural Methods for Point-wise Dependency Estimation
Authors: Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract summary: We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
Score: 129.93860669802046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables. However, MI is an aggregate statistic and cannot be used to measure point-wise dependency between different events. In this work, instead of estimating the expected dependency, we focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We show that we can naturally obtain PD when we are optimizing MI neural variational bounds. However, optimizing these bounds is challenging due to its large variance in practice. To address this issue, we develop two methods (free of optimizing MI variational bounds): Probabilistic Classifier and Density-Ratio Fitting. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.

Related papers

Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Quantifying Emergence in Large Language Models [31.608080868988825]
We propose a quantifiable solution for estimating emergence of LLMs. Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level. Our method demonstrates consistent behaviors across a suite of LMs under both in-context learning (ICL) and natural sentences.
arXiv Detail & Related papers (2024-05-21T09:12:20Z)
MINDE: Mutual Information Neural Diffusion Estimation [7.399561232927219]
We present a new method for the estimation of Mutual Information (MI) between random variables. We use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables.
arXiv Detail & Related papers (2023-10-13T11:47:41Z)
Max-Sliced Mutual Information [17.667315953598788]
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI)
arXiv Detail & Related papers (2023-09-28T06:49:25Z)
Diffeomorphic Information Neural Estimation [2.566492438263125]
Mutual Information (MI) and Conditional Mutual Information (CMI) are multi-purpose tools from information theory. We introduce DINE (Diffeomorphic Information Neural Estorimator)-a novel approach for estimating CMI of continuous random variables. We show that the variables of interest can be replaced with appropriate surrogates that follow simpler distributions.
arXiv Detail & Related papers (2022-11-20T03:03:56Z)
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors. Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP) We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS [0.0]
Mutual information (MI) is an information-theoretic measure of dependency between two random variables. Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios. We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
arXiv Detail & Related papers (2020-11-17T14:32:48Z)
DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data. Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information. Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z)
Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades. Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
arXiv Detail & Related papers (2020-07-01T18:25:10Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.