Efficient, Accurate and Stable Gradients for Neural ODEs
- URL: http://arxiv.org/abs/2410.11648v1
- Date: Tue, 15 Oct 2024 14:36:05 GMT
- Title: Efficient, Accurate and Stable Gradients for Neural ODEs
- Authors: Sam McCallum, James Foster,
- Abstract summary: We present a class of algebraically reversible solvers that are both high-order and numerically stable.
This construction naturally extends to numerical schemes for Neural CDEs and SDEs.
- Score: 3.79830302036482
- License:
- Abstract: Neural ODEs are a recently developed model class that combine the strong model priors of differential equations with the high-capacity function approximation of neural networks. One advantage of Neural ODEs is the potential for memory-efficient training via the continuous adjoint method. However, memory-efficient training comes at the cost of approximate gradients. Therefore, in practice, gradients are often obtained by simply backpropagating through the internal operations of the forward ODE solve - incurring high memory cost. Interestingly, it is possible to construct algebraically reversible ODE solvers that allow for both exact gradients and the memory-efficiency of the continuous adjoint method. Unfortunately, current reversible solvers are low-order and suffer from poor numerical stability. The use of these methods in practice is therefore limited. In this work, we present a class of algebraically reversible solvers that are both high-order and numerically stable. Moreover, any explicit numerical scheme can be made reversible by our method. This construction naturally extends to numerical schemes for Neural CDEs and SDEs.
Related papers
- Accelerated Training through Iterative Gradient Propagation Along the Residual Path [46.577761606415805]
Highway backpropagation is a parallelizable iterative algorithm that approximates backpropagation.
It is adaptable to a diverse set of common architectures, ranging from ResNets and Transformers to recurrent neural networks.
arXiv Detail & Related papers (2025-01-28T17:14:42Z) - Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and
Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN.
Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability.
We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z) - Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs [16.516974867571175]
This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators.
The proximal implicit solver guarantees superiority over explicit solvers in numerical stability and computational efficiency.
arXiv Detail & Related papers (2022-04-19T02:55:10Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Learned Cone-Beam CT Reconstruction Using Neural Ordinary Differential
Equations [8.621792868567018]
Learned iterative reconstruction algorithms for inverse problems offer the flexibility to combine analytical knowledge about the problem with modules learned from data.
In computed tomography, extending such approaches from 2D fan-beam to 3D cone-beam data is challenging due to the prohibitively high GPU memory.
This paper proposes to use neural ordinary differential equations to solve the reconstruction problem in a residual formulation via numerical integration.
arXiv Detail & Related papers (2022-01-19T12:32:38Z) - Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance.
We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z) - Gradient-augmented Supervised Learning of Optimal Feedback Laws Using
State-dependent Riccati Equations [0.0]
A stabilizing feedback law is trained from a dataset generated from State-dependent Riccati Equation solves.
High-dimensional nonlinear stabilization tests demonstrate that real-time sequential large-scale Algebraic Riccati Equation solves can be substituted by a suitably trained feedforward neural network.
arXiv Detail & Related papers (2021-03-06T10:34:23Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Short-Term Memory Optimization in Recurrent Neural Networks by
Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences.
We show how such pretraining can better support solving hard classification tasks with long sequences.
We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z) - ResNet After All? Neural ODEs and Their Numerical Solution [28.954378025052925]
We show that trained Neural Ordinary Differential Equation models actually depend on the specific numerical method used during training.
We propose a method that monitors the behavior of the ODE solver during training to adapt its step size.
arXiv Detail & Related papers (2020-07-30T11:24:05Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.