Unified Perspective on Probability Divergence via Maximum Likelihood
Density Ratio Estimation: Bridging KL-Divergence and Integral Probability
Metrics
- URL: http://arxiv.org/abs/2201.13127v1
- Date: Mon, 31 Jan 2022 11:15:04 GMT
- Title: Unified Perspective on Probability Divergence via Maximum Likelihood
Density Ratio Estimation: Bridging KL-Divergence and Integral Probability
Metrics
- Authors: Masahiro Kato and Masaaki Imaizumi and Kentaro Minami
- Abstract summary: We show that the KL-divergence and the IPMs can be represented as maximal likelihoods differing only by sampling schemes.
We propose a novel class of probability divergences, called the Density Ratio Metrics (DRMs), that interpolates the KL-divergence and the IPMs.
In addition to these findings, we also introduce some applications of the DRMs, such as DRE and generative adversarial networks.
- Score: 15.437224275494838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides a unified perspective for the Kullback-Leibler
(KL)-divergence and the integral probability metrics (IPMs) from the
perspective of maximum likelihood density-ratio estimation (DRE). Both the
KL-divergence and the IPMs are widely used in various fields in applications
such as generative modeling. However, a unified understanding of these concepts
has still been unexplored. In this paper, we show that the KL-divergence and
the IPMs can be represented as maximal likelihoods differing only by sampling
schemes, and use this result to derive a unified form of the IPMs and a relaxed
estimation method. To develop the estimation problem, we construct an
unconstrained maximum likelihood estimator to perform DRE with a stratified
sampling scheme. We further propose a novel class of probability divergences,
called the Density Ratio Metrics (DRMs), that interpolates the KL-divergence
and the IPMs. In addition to these findings, we also introduce some
applications of the DRMs, such as DRE and generative adversarial networks. In
experiments, we validate the effectiveness of our proposed methods.
Related papers
- A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models [6.647819824559201]
We study the large-sample properties of a likelihood-based approach for estimating conditional deep generative models.
Our results lead to the convergence rate of a sieve maximum likelihood estimator for estimating the conditional distribution.
arXiv Detail & Related papers (2024-10-02T20:46:21Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Risk Bounds for Mixture Density Estimation on Compact Domains via the $h$-Lifted Kullback--Leibler Divergence [2.8074364079901017]
We introduce the $h$-lifted Kullback--Leibler (KL) divergence as a generalization of the standard KL divergence.
We develop a procedure for the computation of the corresponding maximum $h$-lifted likelihood estimators.
arXiv Detail & Related papers (2024-04-19T02:31:34Z) - On the Consistency of Maximum Likelihood Estimation of Probabilistic
Principal Component Analysis [1.0528389538549636]
PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance.
Despite this wide applicability in various fields, hardly any theoretical guarantees exist to justify the soundness of the maximal likelihood (ML) solution for this model.
We propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space.
arXiv Detail & Related papers (2023-11-08T22:40:45Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Personalized Trajectory Prediction via Distribution Discrimination [78.69458579657189]
Trarimiy prediction is confronted with the dilemma to capture the multi-modal nature of future dynamics.
We present a distribution discrimination (DisDis) method to predict personalized motion patterns.
Our method can be integrated with existing multi-modal predictive models as a plug-and-play module.
arXiv Detail & Related papers (2021-07-29T17:42:12Z) - A unified view of likelihood ratio and reparameterization gradients [91.4645013545015]
We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of probability mass.
We show that the space of all possible estimators combining LR and RP can be completely parameterized by a flow field.
We prove that there cannot exist a single-sample estimator of this type outside our space, thus, clarifying where we should be searching for better Monte Carlo gradient estimators.
arXiv Detail & Related papers (2021-05-31T11:53:08Z) - Generalized Sliced Distances for Probability Distributions [47.543990188697734]
We introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs)
GSPMs are rooted in the generalized Radon transform and come with a unique geometric interpretation.
We consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum.
arXiv Detail & Related papers (2020-02-28T04:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.