Related papers: A memory-efficient neural ODE framework based on high-level adjoint differentiation

A memory-efficient neural ODE framework based on high-level adjoint differentiation

URL: http://arxiv.org/abs/2206.01298v3
Date: Fri, 9 Jun 2023 15:43:27 GMT
Title: A memory-efficient neural ODE framework based on high-level adjoint differentiation
Authors: Hong Zhang, Wenjun Zhao
Abstract summary: We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
Score: 4.063868707697316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural ordinary differential equations (neural ODEs) have emerged as a novel network architecture that bridges dynamical systems and deep learning. However, the gradient obtained with the continuous adjoint method in the vanilla neural ODE is not reverse-accurate. Other approaches suffer either from an excessive memory requirement due to deep computational graphs or from limited choices for the time integration scheme, hampering their application to large-scale complex dynamical systems. To achieve accurate gradients without compromising memory efficiency and flexibility, we present a new neural ODE framework, PNODE, based on high-level discrete adjoint algorithmic differentiation. By leveraging discrete adjoint time integrators and advanced checkpointing strategies tailored for these integrators, PNODE can provide a balance between memory and computational costs, while computing the gradients consistently and accurately. We provide an open-source implementation based on PyTorch and PETSc, one of the most commonly used portable, scalable scientific computing libraries. We demonstrate the performance through extensive numerical experiments on image classification and continuous normalizing flow problems. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods. On the image classification problems, PNODE is up to two times faster than the vanilla neural ODE and up to 2.3 times faster than the best existing reverse-accurate method. We also show that PNODE enables the use of the implicit time integration methods that are needed for stiff dynamical systems.

Related papers

Training Neural ODEs Using Fully Discretized Simultaneous Optimization [2.290491821371513]
Training Neural Ordinary Differential Equations (Neural ODEs) requires solving differential equations at each epoch, leading to high computational costs. In particular, we employ a collocation-based, fully discretized formulation and use IPOPT-a solver for large-scale nonlinear optimization. Our results show significant potential for (collocation-based) simultaneous Neural ODE training pipelines.
arXiv Detail & Related papers (2025-02-21T18:10:26Z)
On Tuning Neural ODE for Stability, Consistency and Faster Convergence [0.0]
We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode.
arXiv Detail & Related papers (2023-12-04T06:18:10Z)
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec) NeurVec can compensate for integration errors and enable larger time step sizes in simulations. Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Learning ODEs via Diffeomorphisms for Fast and Robust Integration [40.52862415144424]
Differentiable solvers are central for learning Neural ODEs. We propose an alternative approach to learning ODEs from data. We observe improvements of up to two orders of magnitude when integrating learned ODEs with gradient.
arXiv Detail & Related papers (2021-07-04T14:32:16Z)
Calibrating multi-dimensional complex ODE from noisy data via deep neural networks [7.77129750333676]
Ordinary differential equations (ODEs) are widely used to model complex dynamics that arises in biology, chemistry, engineering, finance, physics, etc. We propose a two-stage nonparametric approach to address this problem. We first extract the de-noised data and their higher order derivatives using boundary kernel method, and then feed them into a sparsely connected deep neural network with ReLU activation function.
arXiv Detail & Related papers (2021-06-07T13:17:16Z)
Accelerating Neural ODEs Using Model Order Reduction [0.0]
We show that mathematical model order reduction methods can be used for compressing and accelerating Neural ODEs. We implement our novel compression method by developing Neural ODEs that integrate the necessary subspace-projection and operations as layers of the neural network.
arXiv Detail & Related papers (2021-05-28T19:27:09Z)
Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory [7.1975923901054575]
Backpropagation algorithm requires a memory footprint proportional to the number of uses times the network size. Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time with a minimal memory footprint. This study proposes the symplectic adjoint method, which obtains the exact gradient with a footprint proportional to the number of uses plus the network size.
arXiv Detail & Related papers (2021-02-19T05:47:14Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
DiffPD: Differentiable Projective Dynamics with Contact [65.88720481593118]
We present DiffPD, an efficient differentiable soft-body simulator with implicit time integration. We evaluate the performance of DiffPD and observe a speedup of 4-19 times compared to the standard Newton's method in various applications.
arXiv Detail & Related papers (2021-01-15T00:13:33Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Time Dependence in Non-Autonomous Neural ODEs [74.78386661760662]
We propose a novel family of Neural ODEs with time-varying weights. We outperform previous Neural ODE variants in both speed and representational capacity.
arXiv Detail & Related papers (2020-05-05T01:41:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.