A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models
- URL: http://arxiv.org/abs/1910.14216v7
- Date: Fri, 28 Apr 2023 16:03:06 GMT
- Title: A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models
- Authors: Yang Wu and Pengxu Wei and Liang Lin
- Abstract summary: We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
- Score: 93.24030378630175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel numerical scheme to optimize the gradient
flows for learning energy-based models (EBMs). From a perspective of physical
simulation, we redefine the problem of approximating the gradient flow
utilizing optimal transport (i.e. Wasserstein) metric. In EBMs, the learning
process of stepwise sampling and estimating data distribution performs the
functional gradient of minimizing the global relative entropy between the
current and target real distribution, which can be treated as dynamic particles
moving from disorder to target manifold. Previous learning schemes mainly
minimize the entropy concerning the consecutive time KL divergence in each
learning step. However, they are prone to being stuck in the local KL
divergence by projecting non-smooth information within smooth manifold, which
is against the optimal transport principle. To solve this problem, we derive a
second-order Wasserstein gradient flow of the global relative entropy from
Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient
flow is a smoother and near-optimal numerical scheme to approximate real data
densities. We also derive this near-proximal scheme and provide its numerical
computation equations. Our extensive experiments demonstrate the practical
superiority and potentials of our proposed scheme on fitting complex
distributions and generating high-quality, high-dimensional data with neural
EBMs.
Related papers
- Kernel Approximation of Fisher-Rao Gradient Flows [52.154685604660465]
We present a rigorous investigation of Fisher-Rao and Wasserstein type gradient flows concerning their gradient structures, flow equations, and their kernel approximations.
Specifically, we focus on the Fisher-Rao geometry and its various kernel-based approximations, developing a principled theoretical framework.
arXiv Detail & Related papers (2024-10-27T22:52:08Z) - Dynamical Measure Transport and Neural PDE Solvers for Sampling [77.38204731939273]
We tackle the task of sampling from a probability density as transporting a tractable density function to the target.
We employ physics-informed neural networks (PINNs) to approximate the respective partial differential equations (PDEs) solutions.
PINNs allow for simulation- and discretization-free optimization and can be trained very efficiently.
arXiv Detail & Related papers (2024-07-10T17:39:50Z) - Interaction-Force Transport Gradient Flows [45.05400562268213]
This paper presents a new gradient flow dissipation geometry over non-negative and probability measures.
Using a precise connection between the Hellinger geometry and the maximum mean discrepancy (MMD), we propose the interaction-force transport (IFT) gradient flows.
arXiv Detail & Related papers (2024-05-27T11:46:14Z) - Momentum Particle Maximum Likelihood [2.4561590439700076]
We propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional.
By discretizing the system, we obtain a practical algorithm for Maximum likelihood estimation in latent variable models.
The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms.
arXiv Detail & Related papers (2023-12-12T14:53:18Z) - Flow-based Distributionally Robust Optimization [23.232731771848883]
We present a framework, called $textttFlowDRO$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets.
We aim to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it.
We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy.
arXiv Detail & Related papers (2023-10-30T03:53:31Z) - Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels [1.3654846342364308]
Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure.
We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows.
We provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero.
arXiv Detail & Related papers (2023-01-27T09:57:36Z) - Manifold Interpolating Optimal-Transport Flows for Trajectory Inference [64.94020639760026]
We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow)
MIOFlow learns, continuous population dynamics from static snapshot samples taken at sporadic timepoints.
We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.
arXiv Detail & Related papers (2022-06-29T22:19:03Z) - Extension of Dynamic Mode Decomposition for dynamic systems with
incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data.
The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured.
We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.