Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
- URL: http://arxiv.org/abs/2505.20221v1
- Date: Mon, 26 May 2025 17:03:22 GMT
- Title: Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
- Authors: Xiao Shou, Yanna Ding, Jianxi Gao,
- Abstract summary: Gradient Flow Matching (GFM) is a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned-aware vector fields.<n>By leveraging conditional flow matching, GFM captures the underlying update rules of SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence.
- Score: 3.782392436834913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines. Furthermore, GFM generalizes across neural architectures and initializations, providing a unified framework for studying optimization dynamics and accelerating convergence prediction.
Related papers
- Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling [53.11315884128402]
Temporal Knowledge Graph (TKG) completion models traditionally assume access to the entire graph during training.<n>We present an incremental training framework specifically designed for TKGs, aiming to address entities that are either not observed during training or have sparse connections.<n>Our approach combines a model-agnostic enhancement layer with a weighted sampling strategy, that can be augmented to and improve any existing TKG completion method.
arXiv Detail & Related papers (2025-07-25T06:02:48Z) - Trainable Adaptive Activation Function Structure (TAAFS) Enhances Neural Network Force Field Performance with Only Dozens of Additional Parameters [0.0]
Trainable Adaptive Function Activation Structure (TAAFS)<n>We introduce a method that selects distinct mathematical formulations for non-linear activations.<n>In this study, we integrate TAAFS into a variety of neural network models, resulting in observed accuracy improvements.
arXiv Detail & Related papers (2024-12-19T09:06:39Z) - Graph Neural Networks and Differential Equations: A hybrid approach for data assimilation of fluid flows [0.0]
This study presents a novel hybrid approach that combines Graph Neural Networks (GNNs) with Reynolds-Averaged Navier Stokes (RANS) equations.
The results demonstrate significant improvements in the accuracy of the reconstructed mean flow compared to purely data-driven models.
arXiv Detail & Related papers (2024-11-14T14:31:52Z) - Gradient-free variational learning with conditional mixture networks [39.827869318925494]
We introduce CAVI-CMN, a fast, gradient-free variational method for training conditional mixture networks (CMNs)<n>CAVI-CMN achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.<n>As input size or the number of experts increases, computation time scales competitively with MLE.
arXiv Detail & Related papers (2024-08-29T10:43:55Z) - Enhancing Graph U-Nets for Mesh-Agnostic Spatio-Temporal Flow Prediction [2.3964255330849356]
We explore the potential of Graph U-Nets for unsteady flow-field prediction.
We propose novel approaches to improve mesh-agnostic-temporal robustness prediction transient flow fields using Graph U-Nets.
Key enhancements to the Graph U-Net architecture provide increased flexibility in modeling node dynamics.
arXiv Detail & Related papers (2024-06-06T07:01:36Z) - Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Variational Gradient Descent (VSGD) is an efficient and effective gradient-based image optimization method.<n>We show that VSGD outperforms Adam and SGD on two classification datasets and four deep neural network architectures.
arXiv Detail & Related papers (2024-04-09T18:02:01Z) - Are GATs Out of Balance? [73.2500577189791]
We study the Graph Attention Network (GAT) in which a node's neighborhood aggregation is weighted by parameterized attention coefficients.
Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
arXiv Detail & Related papers (2023-10-11T06:53:05Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - Predictive coding, precision and natural gradients [2.1601966913620325]
We show that hierarchical predictive coding networks with learnable precision are able to solve various supervised and unsupervised learning tasks.
When applied to unsupervised auto-encoding of image inputs, the deterministic network produces hierarchically organized and disentangled embeddings.
arXiv Detail & Related papers (2021-11-12T21:05:03Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Bayesian Graph Neural Networks with Adaptive Connection Sampling [62.51689735630133]
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs)
The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs.
arXiv Detail & Related papers (2020-06-07T07:06:35Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.