Wasserstein proximal operators describe score-based generative models
and resolve memorization
- URL: http://arxiv.org/abs/2402.06162v1
- Date: Fri, 9 Feb 2024 03:33:13 GMT
- Title: Wasserstein proximal operators describe score-based generative models
and resolve memorization
- Authors: Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, and
Stanley J. Osher
- Abstract summary: We first formulate SGMs in terms of the Wasserstein proximal operator (WPO)
We show that WPO describes the inductive bias of diffusion and score-based models.
We present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs.
- Score: 12.321631823103894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We focus on the fundamental mathematical structure of score-based generative
models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal
operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO
formulation reveals mathematical structure that describes the inductive bias of
diffusion and score-based models. In particular, MFGs yield optimality
conditions in the form of a pair of coupled partial differential equations: a
forward-controlled Fokker-Planck (FP) equation, and a backward
Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and
taking advantage of the fact that the cross-entropy can be related to a linear
functional of the density, we show that the HJB equation is an uncontrolled FP
equation. Second, with the mathematical structure at hand, we present an
interpretable kernel-based model for the score function which dramatically
improves the performance of SGMs in terms of training samples and training
time. In addition, the WPO-informed kernel model is explicitly constructed to
avoid the recently studied memorization effects of score-based generative
models. The mathematical form of the new kernel-based models in combination
with the use of the terminal condition of the MFG reveals new explanations for
the manifold learning and generalization properties of SGMs, and provides a
resolution to their memorization effects. Finally, our mathematically informed,
interpretable kernel-based model suggests new scalable bespoke neural network
architectures for high-dimensional applications.
Related papers
- Deep Koopman-layered Model with Universal Property Based on Toeplitz Matrices [26.96258010698567]
The proposed model has both theoretical solidness and flexibility.
The flexibility of the proposed model enables the model to fit time-series data coming from nonautonomous dynamical systems.
arXiv Detail & Related papers (2024-10-03T04:27:46Z) - Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A mean-field games laboratory for generative modeling [5.837881923712394]
Mean-field games (MFGs) are a framework for explaining, enhancing, and designing generative models.
We study the mathematical properties of each generative model by studying their associated MFG's optimality condition.
We propose and demonstrate an HJB-regularized SGM with improved performance over standard SGMs.
arXiv Detail & Related papers (2023-04-26T13:08:50Z) - Generalized Neural Closure Models with Interpretability [28.269731698116257]
We develop a novel and versatile methodology of unified neural partial delay differential equations.
We augment existing/low-fidelity dynamical models directly in their partial differential equation (PDE) forms with both Markovian and non-Markovian neural network (NN) closure parameterizations.
We demonstrate the new generalized neural closure models (gnCMs) framework using four sets of experiments based on advecting nonlinear waves, shocks, and ocean acidification models.
arXiv Detail & Related papers (2023-01-15T21:57:43Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Kernel-Based Models for Influence Maximization on Graphs based on
Gaussian Process Variance Minimization [9.357483974291899]
We introduce and investigate a novel model for influence (IM) on graphs.
Data-driven approaches can be applied to determine proper kernels for this IM model.
Compared to models in this field that rely on costly Monte-Carlo simulations, our model allows for a simple and cost-efficient update strategy.
arXiv Detail & Related papers (2021-03-02T08:55:34Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Estimation of sparse Gaussian graphical models with hidden clustering
structure [8.258451067861932]
We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure.
We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers.
Numerical experiments on both synthetic data and real data demonstrate the good performance of our model.
arXiv Detail & Related papers (2020-04-17T08:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.