Explicit and Effectively Symmetric Schemes for Neural SDEs
- URL: http://arxiv.org/abs/2509.20599v1
- Date: Wed, 24 Sep 2025 22:43:19 GMT
- Title: Explicit and Effectively Symmetric Schemes for Neural SDEs
- Authors: Daniil Shmelev, Cristopher Salvi,
- Abstract summary: We introduce a novel class of stable, near-reversible Runge--Kutta schemes for neural SDEs.<n>These schemes retain the benefits of reversible solvers while overcoming their instability, enabling memory-efficient training.
- Score: 9.126976857662084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backpropagation through (neural) SDE solvers is traditionally approached in two ways: discretise-then-optimise, which offers accurate gradients but incurs prohibitive memory costs due to storing the full computational graph (even when mitigated by checkpointing); and optimise-then-discretise, which achieves constant memory cost by solving an auxiliary backward SDE, but suffers from slower evaluation and gradient approximation errors. Algebraically reversible solvers promise both memory efficiency and gradient accuracy, yet existing methods such as the Reversible Heun scheme are often unstable under complex models and large step sizes. We address these limitations by introducing a novel class of stable, near-reversible Runge--Kutta schemes for neural SDEs. These Explicit and Effectively Symmetric (EES) schemes retain the benefits of reversible solvers while overcoming their instability, enabling memory-efficient training without severe restrictions on step size or model complexity. Through numerical experiments, we demonstrate the superior stability and reliability of our schemes, establishing them as a practical foundation for scalable and accurate training of neural SDEs.
Related papers
- Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - Tensor Gaussian Processes: Efficient Solvers for Nonlinear PDEs [16.38975732337055]
TGPS is a machine learning solver for partial differential equations.<n>It reduces the task to learning a collection of one-dimensional GPs.<n>It achieves superior accuracy and efficiency compared to existing approaches.
arXiv Detail & Related papers (2025-10-15T17:23:21Z) - SING: SDE Inference via Natural Gradients [0.0]
We propose SDE Inference via Natural Gradients (SING) to efficiently exploit the underlying geometry of the model and variational posterior.<n>SING enables fast and reliable inference in latent SDE models by approximating intractable integrals and parallelizing computations in time.<n>We show that SING outperforms prior methods in state inference and drift estimation on a variety of datasets.
arXiv Detail & Related papers (2025-06-21T19:36:11Z) - TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z) - Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Non-adversarial training of Neural SDEs with signature kernel scores [4.721845865189578]
State-of-the-art performance for irregular time series generation has been previously obtained by training these models adversarially as GANs.
In this paper, we introduce a novel class of scoring rules on pathspace based on signature kernels.
arXiv Detail & Related papers (2023-05-25T17:31:18Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - SketchySGD: Reliable Stochastic Optimization via Randomized Curvature
Estimates [19.420605210427635]
SketchySGD improves upon existing gradient methods in machine learning by using randomized low-rank approximations to the subsampled Hessian.
We show theoretically that SketchySGD with a fixed stepsize converges linearly to a small ball around the optimum.
In the ill-conditioned setting we show SketchySGD converges at a faster rate than SGD for least-squares problems.
arXiv Detail & Related papers (2022-11-16T01:05:41Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - Robust and Scalable SDE Learning: A Functional Perspective [5.642000444047032]
We propose an importance-sampling for probabilities of observations of SDE estimators for the purposes of learning.
The proposed method produces lower-variance estimates compared to algorithms based on SDE.
This facilitates the effective use of large-scale parallel hardware for massive decreases in time.
arXiv Detail & Related papers (2021-10-11T11:36:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.