Related papers: Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

URL: http://arxiv.org/abs/2105.03918v1
Date: Sun, 9 May 2021 12:03:03 GMT
Title: Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
Authors: Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas
Abstract summary: We describe a novel regularization method that uses the internal cost of adaptive differential equation solvers combined with discrete sensitivities to guide the training process. This approach opens up the blackbox numerical analysis behind the differential equation solver's algorithm and uses its local error estimates and stiffnesss as cheap and accurate cost estimates. We demonstrate how our approach can halve the prediction time and showcases how this can increase the training time by an order of magnitude.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Democratization of machine learning requires architectures that automatically adapt to new problems. Neural Differential Equations (NDEs) have emerged as a popular modeling framework by removing the need for ML practitioners to choose the number of layers in a recurrent model. While we can control the computational cost by choosing the number of layers in standard architectures, in NDEs the number of neural network evaluations for a forward pass can depend on the number of steps of the adaptive ODE solver. But, can we force the NDE to learn the version with the least steps while not increasing the training cost? Current strategies to overcome slow prediction require high order automatic differentiation, leading to significantly higher training time. We describe a novel regularization method that uses the internal cost heuristics of adaptive differential equation solvers combined with discrete adjoint sensitivities to guide the training process towards learning NDEs that are easier to solve. This approach opens up the blackbox numerical analysis behind the differential equation solver's algorithm and directly uses its local error estimates and stiffness heuristics as cheap and accurate cost estimates. We incorporate our method without any change in the underlying NDE framework and show that our method extends beyond Ordinary Differential Equations to accommodate Neural Stochastic Differential Equations. We demonstrate how our approach can halve the prediction time and, unlike other methods which can increase the training time by an order of magnitude, we demonstrate similar reduction in training times. Together this showcases how the knowledge embedded within state-of-the-art equation solvers can be used to enhance machine learning.

Related papers

Learning by solving differential equations [5.999724026544112]
Runge-Kutta (RK) methods provide a family of very powerful explicit and implicit high-order ODE solvers.<n>We evaluate the performance of RK solvers when applied in deep learning, study their limitations, and propose ways to overcome their drawbacks.
arXiv Detail & Related papers (2025-05-19T17:34:32Z)
Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets [0.0]
We focus on learning unknown dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. We perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. We formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations.
arXiv Detail & Related papers (2023-03-31T06:47:02Z)
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed! [3.222802562733787]
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework. We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
arXiv Detail & Related papers (2023-03-03T23:31:15Z)
On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances. We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z)
Experimental study of Neural ODE training with adaptive solver for dynamical systems modeling [72.84259710412293]
Some ODE solvers called adaptive can adapt their evaluation strategy depending on the complexity of the problem at hand. This paper describes a simple set of experiments to show why adaptive solvers cannot be seamlessly leveraged as a black-box for dynamical systems modelling.
arXiv Detail & Related papers (2022-11-13T17:48:04Z)
A memory-efficient neural ODE framework based on high-level adjoint differentiation [4.063868707697316]
We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
arXiv Detail & Related papers (2022-06-02T20:46:26Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models [38.17499046781131]
We propose a novel approach towards estimating uncertain neural ODEs, avoiding the numerical integration bottleneck. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.
arXiv Detail & Related papers (2021-06-22T08:40:51Z)
Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance. We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z)
Large-scale Neural Solvers for Partial Differential Equations [48.7576911714538]
Solving partial differential equations (PDE) is an indispensable part of many branches of science as many processes can be modelled in terms of PDEs. Recent numerical solvers require manual discretization of the underlying equation as well as sophisticated, tailored code for distributed computing. We examine the applicability of continuous, mesh-free neural solvers for partial differential equations, physics-informed neural networks (PINNs) We discuss the accuracy of GatedPINN with respect to analytical solutions -- as well as state-of-the-art numerical solvers, such as spectral solvers.
arXiv Detail & Related papers (2020-09-08T13:26:51Z)
Learning Differential Equations that are Easy to Solve [26.05208133659686]
We introduce a differentiable surrogate for the time cost of standard numerical solvers, using higher-order derivatives of solution trajectories. We demonstrate our approach by training substantially faster, while nearly as accurate, models in supervised classification, density estimation, and time-series modelling tasks.
arXiv Detail & Related papers (2020-07-09T01:39:34Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Physarum Powered Differentiable Linear Programming Layers and Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems. We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.