(De)-regularized Maximum Mean Discrepancy Gradient Flow
- URL: http://arxiv.org/abs/2409.14980v1
- Date: Mon, 23 Sep 2024 12:57:42 GMT
- Title: (De)-regularized Maximum Mean Discrepancy Gradient Flow
- Authors: Zonghao Chen, Aratrika Mustafi, Pierre Glaser, Anna Korba, Arthur Gretton, Bharath K. Sriperumbudur,
- Abstract summary: We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow.
DrMMD flow can simultaneously guarantee near-global convergence for a broad class of targets in both continuous and discrete time.
Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $chi2$ regime.
- Score: 27.70783952195201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows). In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples. The former is achieved by leveraging the connection between the DrMMD and the $\chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $\chi^2$ regime. The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Importance Corrected Neural JKO Sampling [0.0]
We combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights.
The arising model can be trained iteratively, reduces the reverse Kulback-Leibler (KL) loss function in each step and allows to generate iid samples.
Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets.
arXiv Detail & Related papers (2024-07-29T22:49:59Z) - Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants.
We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z) - Deep MMD Gradient Flow without adversarial training [69.76417786943217]
We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution.
The noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD) is trained on data distributions corrupted by increasing levels of noise.
We demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.
arXiv Detail & Related papers (2024-05-10T19:10:45Z) - Deep conditional distribution learning via conditional Föllmer flow [3.227277661633986]
We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional F"ollmer Flow.
For effective implementation, we discretize the flow with Euler's method where we estimate the velocity field nonparametrically using a deep neural network.
arXiv Detail & Related papers (2024-02-02T14:52:10Z) - Mixed Variational Flows for Discrete Variables [14.00384446902181]
We develop a variational flow family for discrete distributions without any continuous embedding.
First, we develop a measure-preserving and discrete (MAD) invertible map that leaves the discrete target invariant.
We also develop an extension to MAD Mix that handles joint discrete and continuous models.
arXiv Detail & Related papers (2023-08-29T20:13:37Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks.
We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased.
Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z) - Pseudo Numerical Methods for Diffusion Models on Manifolds [77.40343577960712]
Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples.
DDPMs require hundreds to thousands of iterations to produce final samples.
We propose pseudo numerical methods for diffusion models (PNDMs)
PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup)
arXiv Detail & Related papers (2022-02-20T10:37:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.