Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal
Memory
- URL: http://arxiv.org/abs/2102.09750v1
- Date: Fri, 19 Feb 2021 05:47:14 GMT
- Title: Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal
Memory
- Authors: Takashi Matsubara, Yuto Miyatake, Takaharu Yaguchi
- Abstract summary: Backpropagation algorithm requires a memory footprint proportional to the number of uses times the network size.
Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time with a minimal memory footprint.
This study proposes the symplectic adjoint method, which obtains the exact gradient with a footprint proportional to the number of uses plus the network size.
- Score: 7.1975923901054575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A neural network model of a differential equation, namely neural ODE, has
enabled us to learn continuous-time dynamical systems and probabilistic
distributions with a high accuracy. It uses the same network repeatedly during
a numerical integration. Hence, the backpropagation algorithm requires a memory
footprint proportional to the number of uses times the network size. This is
true even if a checkpointing scheme divides the computational graph into
sub-graphs. Otherwise, the adjoint method obtains a gradient by a numerical
integration backward in time with a minimal memory footprint; however, it
suffers from numerical errors. This study proposes the symplectic adjoint
method, which obtains the exact gradient (up to rounding error) with a
footprint proportional to the number of uses plus the network size. The
experimental results demonstrate the symplectic adjoint method occupies the
smallest footprint in most cases, functions faster in some cases, and is robust
to a rounding error among competitive methods.
Related papers
- Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - A memory-efficient neural ODE framework based on high-level adjoint
differentiation [4.063868707697316]
We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation.
We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
arXiv Detail & Related papers (2022-06-02T20:46:26Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Hessian Aware Quantization of Spiking Neural Networks [1.90365714903665]
Neuromorphic architecture allows massively parallel computation with variable and local bit-precisions.
Current gradient based methods of SNN training use a complex neuron model with multiple state variables.
We present a simplified neuron model that reduces the number of state variables by 4-fold while still being compatible with gradient based training.
arXiv Detail & Related papers (2021-04-29T05:27:34Z) - Local Extreme Learning Machines and Domain Decomposition for Solving
Linear and Nonlinear Partial Differential Equations [0.0]
We present a neural network-based method for solving linear and nonlinear partial differential equations.
The method combines the ideas of extreme learning machines (ELM), domain decomposition and local neural networks.
We compare the current method with the deep Galerkin method (DGM) and the physics-informed neural network (PINN) in terms of the accuracy and computational cost.
arXiv Detail & Related papers (2020-12-04T23:19:39Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Learned Factor Graphs for Inference from Stationary Time Sequences [107.63351413549992]
We propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences.
neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence.
We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data.
arXiv Detail & Related papers (2020-06-05T07:06:19Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.