Optimal Neural Network Approximation of Wasserstein Gradient Direction
via Convex Optimization
- URL: http://arxiv.org/abs/2205.13098v1
- Date: Thu, 26 May 2022 00:51:12 GMT
- Title: Optimal Neural Network Approximation of Wasserstein Gradient Direction
via Convex Optimization
- Authors: Yifei Wang, Peng Chen, Mert Pilanci, Wuchen Li
- Abstract summary: The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing.
We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation.
This SDP can be viewed as an approximation of the Wasserstein gradient in a broader function family including two-layer networks.
- Score: 43.6961980403682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The computation of Wasserstein gradient direction is essential for posterior
sampling problems and scientific computing. The approximation of the
Wasserstein gradient with finite samples requires solving a variational
problem. We study the variational problem in the family of two-layer networks
with squared-ReLU activations, towards which we derive a semi-definite
programming (SDP) relaxation. This SDP can be viewed as an approximation of the
Wasserstein gradient in a broader function family including two-layer networks.
By solving the convex SDP, we obtain the optimal approximation of the
Wasserstein gradient direction in this class of functions. Numerical
experiments including PDE-constrained Bayesian inference and parameter
estimation in COVID-19 modeling demonstrate the effectiveness of the proposed
method.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Variational Wasserstein gradient flow [9.901677207027806]
We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
arXiv Detail & Related papers (2021-12-04T20:27:31Z) - q-RBFNN:A Quantum Calculus-based RBF Neural Network [31.14412266444568]
A gradient descent based learning approach for the radial basis function neural networks (RBFNN) is proposed.
The proposed method is based on the q-gradient which is also known as Jackson derivative.
The proposed $q$-RBFNN is analyzed for its convergence performance in the context of least square algorithm.
arXiv Detail & Related papers (2021-06-02T08:27:12Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Learning High Dimensional Wasserstein Geodesics [55.086626708837635]
We propose a new formulation and learning strategy for computing the Wasserstein geodesic between two probability distributions in high dimensions.
By applying the method of Lagrange multipliers to the dynamic formulation of the optimal transport (OT) problem, we derive a minimax problem whose saddle point is the Wasserstein geodesic.
We then parametrize the functions by deep neural networks and design a sample based bidirectional learning algorithm for training.
arXiv Detail & Related papers (2021-02-05T04:25:28Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.