A Concentration Inequality for Maximum Mean Discrepancy (MMD)-based Statistics and Its Application in Generative Models
- URL: http://arxiv.org/abs/2405.14051v2
- Date: Sun, 20 Oct 2024 05:09:53 GMT
- Title: A Concentration Inequality for Maximum Mean Discrepancy (MMD)-based Statistics and Its Application in Generative Models
- Authors: Yijin Ni, Xiaoming Huo,
- Abstract summary: We propose a uniform concentration inequality for a class of Maximum Mean Discrepancy (MMD)-based estimators.
Our inequality serves as an efficient tool in the theoretical analysis for MMD-based generative models.
- Score: 4.757470449749877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Maximum Mean Discrepancy (MMD) is a probability metric that has found numerous applications in machine learning. In this work, we focus on its application in generative models, including the minimum MMD estimator, Generative Moment Matching Network (GMMN), and Generative Adversarial Network (GAN). In these cases, MMD is part of an objective function in a minimization or min-max optimization problem. Even if its empirical performance is competitive, the consistency and convergence rate analysis of the corresponding MMD-based estimators has yet to be carried out. We propose a uniform concentration inequality for a class of Maximum Mean Discrepancy (MMD)-based estimators, that is, a maximum deviation bound of empirical MMD values over a collection of generated distributions and adversarially learned kernels. Here, our inequality serves as an efficient tool in the theoretical analysis for MMD-based generative models. As elaborating examples, we applied our main result to provide the generalization error bounds for the MMD-based estimators in the context of the minimum MMD estimator and MMD GAN.
Related papers
- Consistent Estimation of a Class of Distances Between Covariance Matrices [7.291687946822539]
We are interested in the family of distances that can be expressed as sums of traces of functions that are separately applied to each covariance matrix.
A statistical analysis of the behavior of this class of distance estimators has also been conducted.
We present a central limit theorem that establishes the Gaussianity of these estimators and provides closed form expressions for the corresponding means and variances.
arXiv Detail & Related papers (2024-09-18T07:36:25Z) - A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models [63.949883238901414]
We present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs.
We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.
arXiv Detail & Related papers (2024-08-29T17:46:18Z) - Statistical Framework for Clustering MU-MIMO Wireless via Second Order Statistics [8.195126516665914]
We consider an estimator of the Log-Euclidean distance between multiple sample covariance matrices (SCMs) consistent when the number of samples and the observation size grow unbounded at the same rate.
We develop a statistical framework that allows accurate predictions of the clustering algorithm's performance under realistic conditions.
arXiv Detail & Related papers (2024-08-08T14:23:06Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Partial identification of kernel based two sample tests with mismeasured
data [5.076419064097733]
Two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications.
We study the estimation of the MMD under $epsilon$-contamination, where a possibly non-random $epsilon$ proportion of one distribution is erroneously grouped with the other.
We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases.
arXiv Detail & Related papers (2023-08-07T13:21:58Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - Cycle Consistent Probability Divergences Across Different Spaces [38.43511529063335]
Discrepancy measures between probability distributions are at the core of statistical inference and machine learning.
This work proposes a novel unbalanced Monge optimal transport formulation for matching, up to isometries, distributions on different spaces.
arXiv Detail & Related papers (2021-11-22T16:35:58Z) - Maximum Mean Discrepancy for Generalization in the Presence of
Distribution and Missingness Shift [0.0]
We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample.
Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set.
arXiv Detail & Related papers (2021-11-19T18:01:05Z) - On the Optimization Landscape of Maximum Mean Discrepancy [26.661542645011046]
Generative models have been successfully used for generating realistic signals.
Because the likelihood function is typically intractable in most of these models, the common practice is to "implicit" that avoid likelihood calculation.
In particular, it is not understood when they can minimize their non-repancy objectives globally.
arXiv Detail & Related papers (2021-10-26T07:32:37Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel
Manifold [41.58534159822546]
This paper defines fair principal component analysis (PCA) as minimizing the maximum discrepancy (MMD) between dimensionality-reduced conditional distributions.
We provide optimality guarantees and explicitly show the theoretical effect in practical settings.
arXiv Detail & Related papers (2021-09-23T08:06:02Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Entropy Minimizing Matrix Factorization [102.26446204624885]
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
arXiv Detail & Related papers (2021-03-24T21:08:43Z) - Rethink Maximum Mean Discrepancy for Domain Adaptation [77.2560592127872]
This paper theoretically proves two essential facts: 1) minimizing the Maximum Mean Discrepancy equals to maximize the source and target intra-class distances respectively but jointly minimize their variance with some implicit weights, so that the feature discriminability degrades.
Experiments on several benchmark datasets not only prove the validity of theoretical results but also demonstrate that our approach could perform better than the comparative state-of-art methods substantially.
arXiv Detail & Related papers (2020-07-01T18:25:10Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.