Related papers: TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed

TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed

URL: http://arxiv.org/abs/2203.10335v1
Date: Sat, 19 Mar 2022 14:56:41 GMT
Title: TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed
Authors: Shian Du, Yihong Luo, Wei Chen, Jian Xu, Delu Zeng
Abstract summary: Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution. It has not been tractable on large datasets due to the incremental complexity of the neural ODE training. In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training.
Score: 12.168241245313164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution using Neural Ordinary Differential Equations (neural ODEs). It has not been tractable on large datasets due to the incremental complexity of the neural ODE training. Optimal Transport theory has been applied to regularize the dynamics of the ODE to speed up training in recent works. In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training. In this appoach, we optimize the network weights of the CNF alternately with evolutionary time by coordinate descent. Further with temporal regularization, stability of the evolution is ensured. This approach can be used in conjunction with the original regularization approach. We have experimentally demonstrated that the proposed approach can significantly accelerate training without sacrifying performance over baseline models.

Related papers

Efficient Training of Physics-enhanced Neural ODEs via Direct Collocation and Nonlinear Programming [0.0]
We propose a novel approach for training Physics-enhanced Neural ODEs (PeN-ODEs) by expressing the training process as a dynamic optimization problem.<n>The full model, including neural components, is discretized using a high-order implicit Runge-Kutta method with flipped Legendre-Gauss-Radau points.<n>This formulation enables simultaneous optimization of network parameters and state trajectories, addressing key limitations of ODE solver-based training in terms of stability, runtime, and accuracy.
arXiv Detail & Related papers (2025-05-06T14:04:46Z)
Training Neural ODEs Using Fully Discretized Simultaneous Optimization [2.290491821371513]
Training Neural Ordinary Differential Equations (Neural ODEs) requires solving differential equations at each epoch, leading to high computational costs. In particular, we employ a collocation-based, fully discretized formulation and use IPOPT-a solver for large-scale nonlinear optimization. Our results show significant potential for (collocation-based) simultaneous Neural ODE training pipelines.
arXiv Detail & Related papers (2025-02-21T18:10:26Z)
Conformal Symplectic Optimization for Stable Reinforcement Learning [21.491621524500736]
By utilizing relativistic kinetic energy, RAD incorporates from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal influences. Notably, RAD achieves up to a 155.1% performance improvement, showcasing its efficacy in training Atari games.
arXiv Detail & Related papers (2024-12-03T09:07:31Z)
Tensor-Valued Time and Inference Path Optimization in Differential Equation-Based Generative Modeling [16.874769609089764]
This work introduces, for the first time, a tensor-valued time that expands the conventional scalar-valued time into multiple dimensions. We also propose a novel path optimization problem designed to adaptively determine multidimensional inference trajectories.
arXiv Detail & Related papers (2024-04-22T13:20:01Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability [0.0]
We study gradient descent (GD)-based sparse training and evolution strategies (ES) We find that ES explore diverse and flat local optima and do not preserve linear mode connectivity across sparsity levels and independent runs.
arXiv Detail & Related papers (2023-05-31T15:58:54Z)
Manifold Interpolating Optimal-Transport Flows for Trajectory Inference [64.94020639760026]
We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) MIOFlow learns, continuous population dynamics from static snapshot samples taken at sporadic timepoints. We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.
arXiv Detail & Related papers (2022-06-29T22:19:03Z)
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z)
Second-Order Neural ODE Optimizer [11.92713188431164]
We show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. The resulting method converges much faster than first-order baselines in wall-clock time. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies.
arXiv Detail & Related papers (2021-09-29T02:58:18Z)
Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems. Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z)
A Differential Game Theoretic Neural Optimizer for Training Residual Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers. The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
How to train your neural ODE: the world of Jacobian and kinetic regularization [7.83405844354125]
Training neural ODEs on large datasets has not been tractable due to the necessity of allowing the adaptive numerical ODE solver to refine its step size to very small values. We introduce a theoretically-grounded combination of both optimal transport and stability regularizations which encourage neural ODEs to prefer simpler dynamics out of all the dynamics that solve a problem well.
arXiv Detail & Related papers (2020-02-07T14:15:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.