Scalable Gradients for Stochastic Differential Equations
- URL: http://arxiv.org/abs/2001.01328v6
- Date: Sun, 18 Oct 2020 21:16:05 GMT
- Title: Scalable Gradients for Stochastic Differential Equations
- Authors: Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud
- Abstract summary: adjoint sensitivity method scalably computes gradients of ordinary differential equations.
We generalize this method to differential equations, allowing time-efficient and constant-memory computation.
We use our method to fit neural dynamics defined by networks, achieving competitive performance on a 50-dimensional motion capture dataset.
- Score: 40.70998833051251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adjoint sensitivity method scalably computes gradients of solutions to
ordinary differential equations. We generalize this method to stochastic
differential equations, allowing time-efficient and constant-memory computation
of gradients with high-order adaptive solvers. Specifically, we derive a
stochastic differential equation whose solution is the gradient, a
memory-efficient algorithm for caching noise, and conditions under which
numerical solutions converge. In addition, we combine our method with
gradient-based stochastic variational inference for latent stochastic
differential equations. We use our method to fit stochastic dynamics defined by
neural networks, achieving competitive performance on a 50-dimensional motion
capture dataset.
Related papers
- Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Solving Fractional Differential Equations on a Quantum Computer: A Variational Approach [0.1492582382799606]
We introduce an efficient variational hybrid quantum-classical algorithm designed for solving Caputo time-fractional partial differential equations.
Our results indicate that solution fidelity is insensitive to the fractional index and that gradient evaluation cost scales economically with the number of time steps.
arXiv Detail & Related papers (2024-06-13T02:27:16Z) - Amortized Reparametrization: Efficient and Scalable Variational
Inference for Latent SDEs [3.2634122554914002]
We consider the problem of inferring latent differential equations with a time and memory cost that scales independently with the amount of data, the total length of the time series, and the stiffness of the approximate differential equations.
This is in stark contrast to typical methods for inferring latent differential equations which, despite their constant memory cost, have a time complexity that is heavily dependent on the stiffness of the approximate differential equation.
arXiv Detail & Related papers (2023-12-16T22:27:36Z) - Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent [1.534667887016089]
We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow.
We study the stability of the modified equation in the case of coordinate descent.
arXiv Detail & Related papers (2023-09-05T09:39:56Z) - Symbolic Recovery of Differential Equations: The Identifiability Problem [52.158782751264205]
Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations.
We provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation.
We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely.
arXiv Detail & Related papers (2022-10-15T17:32:49Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - Automated differential equation solver based on the parametric
approximation optimization [77.34726150561087]
The article presents a method that uses an optimization algorithm to obtain a solution using the parameterized approximation.
It allows solving the wide class of equations in an automated manner without the algorithm's parameters change.
arXiv Detail & Related papers (2022-05-11T10:06:47Z) - Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution
Differential Equations [83.3201889218775]
Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) when derived naively.
However, the convergence properties of these methods are qualitatively different, even on simple bilinear games.
We adopt a framework studied in fluid dynamics to design differential equation models for several saddle-point optimization methods.
arXiv Detail & Related papers (2021-12-27T18:31:34Z) - Numerical Solution of Stiff Ordinary Differential Equations with Random
Projection Neural Networks [0.0]
We propose a numerical scheme based on Random Projection Neural Networks (RPNN) for the solution of Ordinary Differential Equations (ODEs)
We show that our proposed scheme yields good numerical approximation accuracy without being affected by the stiffness, thus outperforming in same cases the textttode45 and textttode15s functions.
arXiv Detail & Related papers (2021-08-03T15:49:17Z) - Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs)
Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs.
These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.