Reliable Estimation of KL Divergence using a Discriminator in
Reproducing Kernel Hilbert Space
- URL: http://arxiv.org/abs/2109.14688v1
- Date: Wed, 29 Sep 2021 19:50:06 GMT
- Title: Reliable Estimation of KL Divergence using a Discriminator in
Reproducing Kernel Hilbert Space
- Authors: Sandesh Ghimire, Aria Masoomi and Jennifer Dy
- Abstract summary: Estimating Kullback Leibler (KL) divergence from samples of two distributions is essential in many machine learning problems.
Variational methods using neural network discriminator have been proposed to achieve this task in a scalable manner.
- Score: 1.8906500245637619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating Kullback Leibler (KL) divergence from samples of two distributions
is essential in many machine learning problems. Variational methods using
neural network discriminator have been proposed to achieve this task in a
scalable manner. However, we noted that most of these methods using neural
network discriminators suffer from high fluctuations (variance) in estimates
and instability in training. In this paper, we look at this issue from
statistical learning theory and function space complexity perspective to
understand why this happens and how to solve it. We argue that the cause of
these pathologies is lack of control over the complexity of the neural network
discriminator function and could be mitigated by controlling it. To achieve
this objective, we 1) present a novel construction of the discriminator in the
Reproducing Kernel Hilbert Space (RKHS), 2) theoretically relate the error
probability bound of the KL estimates to the complexity of the discriminator in
the RKHS space, 3) present a scalable way to control the complexity (RKHS norm)
of the discriminator for a reliable estimation of KL divergence, and 4) prove
the consistency of the proposed estimator. In three different applications of
KL divergence : estimation of KL, estimation of mutual information and
Variational Bayes, we show that by controlling the complexity as developed in
the theory, we are able to reduce the variance of KL estimates and stabilize
the training
Related papers
- Better Estimation of the KL Divergence Between Language Models [58.7977683502207]
Estimating the Kullback--Leibler (KL) divergence between language models has many applications.
We introduce a Rao--Blackwellized estimator that is also unbiased and provably has variance less than or equal to that of the standard Monte Carlo estimator.
arXiv Detail & Related papers (2025-04-14T18:40:02Z) - Kolmogorov-Smirnov GAN [52.36633001046723]
We propose a novel deep generative model, the Kolmogorov-Smirnov Generative Adversarial Network (KSGAN)
Unlike existing approaches, KSGAN formulates the learning process as a minimization of the Kolmogorov-Smirnov (KS) distance.
arXiv Detail & Related papers (2024-06-28T14:30:14Z) - Sinkhorn Distance Minimization for Knowledge Distillation [97.64216712016571]
Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs)
In this paper, we show that the aforementioned KL, RKL, and JS divergences respectively suffer from issues of mode-averaging, mode-collapsing, and mode-underestimation.
We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions.
arXiv Detail & Related papers (2024-02-27T01:13:58Z) - Variational Inference of overparameterized Bayesian Neural Networks: a
theoretical and empirical study [27.86555142135798]
This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN)
We point out a critical issue in the mean-field VI training.
This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms.
arXiv Detail & Related papers (2022-07-08T12:31:08Z) - Conjugate Gradient Method for Generative Adversarial Networks [0.0]
It is not feasible to calculate the Jensen-Shannon divergence of the density function of the data and the density function of the model of deep neural networks.
Generative adversarial networks (GANs) can be used to formulate this problem as a discriminative problem with two models, a generator and a discriminator.
We propose to apply the conjugate gradient method to solve the local Nash equilibrium problem in GANs.
arXiv Detail & Related papers (2022-03-28T04:44:45Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS [0.0]
Mutual information (MI) is an information-theoretic measure of dependency between two random variables.
Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios.
We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
arXiv Detail & Related papers (2020-11-17T14:32:48Z) - Forward and inverse reinforcement learning sharing network weights and
hyperparameters [3.705785916791345]
ERIL combines forward and inverse reinforcement learning (RL) under the framework of an entropy-regularized Markov decision process.
A forward RL step minimizes the reverse KL estimated by the inverse RL step.
We show that minimizing the reverse KL divergence is equivalent to finding an optimal policy.
arXiv Detail & Related papers (2020-08-17T13:12:44Z) - Analysis of Discriminator in RKHS Function Space for Kullback-Leibler
Divergence Estimation [5.146375037973682]
We study a generative adversarial network based approach that uses a neural network discriminator to estimate Kullback Leibler (KL) divergence.
We argue that high fluctuations in the estimates are a consequence of not controlling the complexity of the discriminator function space.
arXiv Detail & Related papers (2020-02-25T21:44:52Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.