A memory-efficient neural ODE framework based on high-level adjoint
differentiation
- URL: http://arxiv.org/abs/2206.01298v3
- Date: Fri, 9 Jun 2023 15:43:27 GMT
- Title: A memory-efficient neural ODE framework based on high-level adjoint
differentiation
- Authors: Hong Zhang, Wenjun Zhao
- Abstract summary: We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation.
We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
- Score: 4.063868707697316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural ordinary differential equations (neural ODEs) have emerged as a novel
network architecture that bridges dynamical systems and deep learning. However,
the gradient obtained with the continuous adjoint method in the vanilla neural
ODE is not reverse-accurate. Other approaches suffer either from an excessive
memory requirement due to deep computational graphs or from limited choices for
the time integration scheme, hampering their application to large-scale complex
dynamical systems. To achieve accurate gradients without compromising memory
efficiency and flexibility, we present a new neural ODE framework, PNODE, based
on high-level discrete adjoint algorithmic differentiation. By leveraging
discrete adjoint time integrators and advanced checkpointing strategies
tailored for these integrators, PNODE can provide a balance between memory and
computational costs, while computing the gradients consistently and accurately.
We provide an open-source implementation based on PyTorch and PETSc, one of the
most commonly used portable, scalable scientific computing libraries. We
demonstrate the performance through extensive numerical experiments on image
classification and continuous normalizing flow problems. We show that PNODE
achieves the highest memory efficiency when compared with other
reverse-accurate methods. On the image classification problems, PNODE is up to
two times faster than the vanilla neural ODE and up to 2.3 times faster than
the best existing reverse-accurate method. We also show that PNODE enables the
use of the implicit time integration methods that are needed for stiff
dynamical systems.
Related papers
- On Tuning Neural ODE for Stability, Consistency and Faster Convergence [0.0]
We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions.
We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode.
arXiv Detail & Related papers (2023-12-04T06:18:10Z) - On Fast Simulation of Dynamical System with Neural Vector Enhanced
Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec)
NeurVec can compensate for integration errors and enable larger time step sizes in simulations.
Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Learning ODEs via Diffeomorphisms for Fast and Robust Integration [40.52862415144424]
Differentiable solvers are central for learning Neural ODEs.
We propose an alternative approach to learning ODEs from data.
We observe improvements of up to two orders of magnitude when integrating learned ODEs with gradient.
arXiv Detail & Related papers (2021-07-04T14:32:16Z) - Calibrating multi-dimensional complex ODE from noisy data via deep
neural networks [7.77129750333676]
Ordinary differential equations (ODEs) are widely used to model complex dynamics that arises in biology, chemistry, engineering, finance, physics, etc.
We propose a two-stage nonparametric approach to address this problem.
We first extract the de-noised data and their higher order derivatives using boundary kernel method, and then feed them into a sparsely connected deep neural network with ReLU activation function.
arXiv Detail & Related papers (2021-06-07T13:17:16Z) - Accelerating Neural ODEs Using Model Order Reduction [0.0]
We show that mathematical model order reduction methods can be used for compressing and accelerating Neural ODEs.
We implement our novel compression method by developing Neural ODEs that integrate the necessary subspace-projection and operations as layers of the neural network.
arXiv Detail & Related papers (2021-05-28T19:27:09Z) - Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal
Memory [7.1975923901054575]
Backpropagation algorithm requires a memory footprint proportional to the number of uses times the network size.
Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time with a minimal memory footprint.
This study proposes the symplectic adjoint method, which obtains the exact gradient with a footprint proportional to the number of uses plus the network size.
arXiv Detail & Related papers (2021-02-19T05:47:14Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - DiffPD: Differentiable Projective Dynamics with Contact [65.88720481593118]
We present DiffPD, an efficient differentiable soft-body simulator with implicit time integration.
We evaluate the performance of DiffPD and observe a speedup of 4-19 times compared to the standard Newton's method in various applications.
arXiv Detail & Related papers (2021-01-15T00:13:33Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Time Dependence in Non-Autonomous Neural ODEs [74.78386661760662]
We propose a novel family of Neural ODEs with time-varying weights.
We outperform previous Neural ODE variants in both speed and representational capacity.
arXiv Detail & Related papers (2020-05-05T01:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.