Related papers: Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

URL: http://arxiv.org/abs/2005.13420v2
Date: Fri, 31 Jul 2020 00:28:12 GMT
Title: Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows
Authors: Derek Onken and Lars Ruthotto
Abstract summary: We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Disc-Opt reduced costs in six out of seven separate problems with training time ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.
Score: 5.71097144710995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.

Related papers

On Tuning Neural ODE for Stability, Consistency and Faster Convergence [0.0]
We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode.
arXiv Detail & Related papers (2023-12-04T06:18:10Z)
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed! [3.222802562733787]
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework. We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
arXiv Detail & Related papers (2023-03-03T23:31:15Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
A memory-efficient neural ODE framework based on high-level adjoint differentiation [4.063868707697316]
We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
arXiv Detail & Related papers (2022-06-02T20:46:26Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Differentiable Multiple Shooting Layers [18.37758865401204]
Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. We develop the algorithmic framework of MSLs, analyzing the different choices of solution methods from a theoretical and computational perspective. We investigate the speedups obtained through application of MSL inference in neural controlled differential equations (Neural CDEs) for time series classification of medical data.
arXiv Detail & Related papers (2021-06-07T18:05:44Z)
Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
ResNet After All? Neural ODEs and Their Numerical Solution [28.954378025052925]
We show that trained Neural Ordinary Differential Equation models actually depend on the specific numerical method used during training. We propose a method that monitors the behavior of the ODE solver during training to adapt its step size.
arXiv Detail & Related papers (2020-07-30T11:24:05Z)
STEER: Simple Temporal Regularization For Neural ODEs [80.80350769936383]
We propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
arXiv Detail & Related papers (2020-06-18T17:44:50Z)
Time Dependence in Non-Autonomous Neural ODEs [74.78386661760662]
We propose a novel family of Neural ODEs with time-varying weights. We outperform previous Neural ODE variants in both speed and representational capacity.
arXiv Detail & Related papers (2020-05-05T01:41:46Z)
Stochasticity in Neural ODEs: An Empirical Study [68.8204255655161]
Regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization. We show that data augmentation during the training improves the performance of both deterministic and versions of the same model. However, the improvements obtained by the data augmentation completely eliminate the empirical regularization gains, making the performance of neural ODE and neural SDE negligible.
arXiv Detail & Related papers (2020-02-22T22:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.