Mixed Precision Training of Neural ODEs
- URL: http://arxiv.org/abs/2510.23498v1
- Date: Mon, 27 Oct 2025 16:32:56 GMT
- Title: Mixed Precision Training of Neural ODEs
- Authors: Elena Celledoni, Brynjulf Owren, Lars Ruthotto, Tianjiao Nicole Yang,
- Abstract summary: This paper presents a mixed precision training framework for neural ODEs.<n>It combines explicit ODE solvers with a custom backpropagation scheme.<n>It achieves approximately 50% memory reduction and up to 2x speedup while maintaining accuracy comparable to single-precision training.
- Score: 1.3382837742547355
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Exploiting low-precision computations has become a standard strategy in deep learning to address the growing computational costs imposed by ever larger models and datasets. However, naively performing all computations in low precision can lead to roundoff errors and instabilities. Therefore, mixed precision training schemes usually store the weights in high precision and use low-precision computations only for whitelisted operations. Despite their success, these principles are currently not reliable for training continuous-time architectures such as neural ordinary differential equations (Neural ODEs). This paper presents a mixed precision training framework for neural ODEs, combining explicit ODE solvers with a custom backpropagation scheme, and demonstrates its effectiveness across a range of learning tasks. Our scheme uses low-precision computations for evaluating the velocity, parameterized by the neural network, and for storing intermediate states, while stability is provided by a custom dynamic adjoint scaling and by accumulating the solution and gradients in higher precision. These contributions address two key challenges in training neural ODE: the computational cost of repeated network evaluations and the growth of memory requirements with the number of time steps or layers. Along with the paper, we publish our extendable, open-source PyTorch package rampde, whose syntax resembles that of leading packages to provide a drop-in replacement in existing codes. We demonstrate the reliability and effectiveness of our scheme using challenging test cases and on neural ODE applications in image classification and generative models, achieving approximately 50% memory reduction and up to 2x speedup while maintaining accuracy comparable to single-precision training.
Related papers
- Full Integer Arithmetic Online Training for Spiking Neural Networks [0.006486143522483092]
Spiking Neural Networks (SNNs) are promising for neuromorphic computing due to their biological plausibility and energy efficiency.<n>This work introduces an integer-only, online training algorithm using a mixed-precision approach to improve efficiency and reduce memory usage by over 60%.
arXiv Detail & Related papers (2025-09-08T12:54:30Z) - Self-Composing Neural Operators with Depth and Accuracy Scaling via Adaptive Train-and-Unroll Approach [12.718377513965912]
We propose a novel framework to enhance the efficiency and accuracy of neural operators through self-composition.<n>Inspired by iterative methods in solving numerical partial differential equations (PDEs), we design a specific neural operator by repeatedly applying a single neural operator block.<n>We introduce an adaptive train-and-unroll approach, where the depth of the neural operator is gradually increased during training.
arXiv Detail & Related papers (2025-08-28T10:53:00Z) - LaPON: A Lagrange's-mean-value-theorem-inspired operator network for solving PDEs and its application on NSE [8.014720523981385]
We propose LaPON, an operator network inspired by the Lagrange's mean value theorem.<n>It embeds prior knowledge directly into the neural network architecture instead of the loss function.<n>LaPON provides a scalable and reliable solution for high-fidelity fluid dynamics simulation.
arXiv Detail & Related papers (2025-05-18T10:45:17Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - On Fast Simulation of Dynamical System with Neural Vector Enhanced
Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec)
NeurVec can compensate for integration errors and enable larger time step sizes in simulations.
Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z) - A memory-efficient neural ODE framework based on high-level adjoint
differentiation [4.063868707697316]
We present a new neural ODE framework, PNODE, based on high-level discrete algorithmic differentiation.
We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods.
arXiv Detail & Related papers (2022-06-02T20:46:26Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.