Related papers: Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations

Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations

URL: http://arxiv.org/abs/2202.06637v2
Date: Sat, 26 Aug 2023 23:36:08 GMT
Title: Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations
Authors: Ziheng Wang and Justin Sirignano
Abstract summary: We develop a new continuous-time gradient descent method for optimizing over the stationary distribution oficity differential equation (SDE) models. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models and present its numerical results for nonlinear examples.
Score: 7.65995376636176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of stochastic differential equation (SDE) models. The algorithm continuously updates the SDE model's parameters using an estimate for the gradient of the stationary distribution. The gradient estimate is simultaneously updated using forward propagation of the SDE state derivatives, asymptotically converging to the direction of steepest descent. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models (i.e., the multi-dimensional Ornstein-Uhlenbeck process) and present its numerical results for nonlinear examples. The proof requires analysis of the fluctuations of the parameter evolution around the direction of steepest descent. Bounds on the fluctuations are challenging to obtain due to the online nature of the algorithm (e.g., the stationary distribution will continuously change as the parameters change). We prove bounds for the solutions of a new class of Poisson partial differential equations (PDEs), which are then used to analyze the parameter fluctuations in the algorithm. Our algorithm is applicable to a range of mathematical finance applications involving statistical calibration of SDE models and stochastic optimal control for long time horizons where ergodicity of the data and stochastic process is a suitable modeling framework. Numerical examples explore these potential applications, including learning a neural network control for high-dimensional optimal control of SDEs and training stochastic point process models of limit order book events.

Related papers

Variational Neural Stochastic Differential Equations with Change Points [4.692174333076032]
We explore modeling change points in time-series data using neural differential equations (neural SDEs) We propose a novel model formulation and training procedure based on the variational autoencoder (VAE) framework for modeling time-series as a neural SDE. We present an empirical evaluation that demonstrates the expressive power of our proposed model, showing that it can effectively model both classical parametric SDEs and some real datasets with distribution shifts.
arXiv Detail & Related papers (2024-11-01T14:46:17Z)
Noise in the reverse process improves the approximation capabilities of diffusion models [27.65800389807353]
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, reverse processes are known to perform better than their deterministic counterparts. This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural dimension equations (SDEs) as reverse processes. We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of neurality.
arXiv Detail & Related papers (2023-12-13T02:39:10Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE. We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z)
Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z)
A Forward Propagation Algorithm for Online Optimization of Nonlinear Stochastic Differential Equations [1.116812194101501]
We study the convergence of the forward propagation algorithm for nonlinear dissipative SDEs. We prove bounds on the solution of a partial differential equation (PDE) for the expected time integral of the algorithm's fluctuations around the direction of steepest descent. Our main result is a convergence theorem for the forward propagation algorithm for nonlinear dissipative SDEs.
arXiv Detail & Related papers (2022-07-10T16:06:42Z)
Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation [14.951655356042949]
Simulation-based techniques such as variants of Runge-Kutta are the de facto approach for inference with differential equations (SDEs) in machine learning. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications.
arXiv Detail & Related papers (2021-10-29T12:22:55Z)
Probabilistic Circuits for Variational Inference in Discrete Graphical Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult. Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO) We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN) We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)
Identifying Latent Stochastic Differential Equations [29.103393300261587]
We present a method for learning latent differential equations (SDEs) from high-dimensional time series data. The proposed method learns the mapping from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets.
arXiv Detail & Related papers (2020-07-12T19:46:31Z)
Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs) Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs. These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z)
A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs) We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.