Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields
- URL: http://arxiv.org/abs/2505.24434v2
- Date: Wed, 04 Jun 2025 08:33:25 GMT
- Title: Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields
- Authors: Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber,
- Abstract summary: Flow matching casts sample generation as learning a continuous-time velocity field that transports noise to data.<n>We propose Graph Flow Matching, a lightweight enhancement that decomposes the learned velocity into a reaction term.<n> operating in the latent space of a pretrained variational autoencoder.
- Score: 7.435063833417364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flow matching casts sample generation as learning a continuous-time velocity field that transports noise to data. Existing flow matching networks typically predict each point's velocity independently, considering only its location and time along its flow trajectory, and ignoring neighboring points. However, this pointwise approach may overlook correlations between points along the generation trajectory that could enhance velocity predictions, thereby improving downstream generation quality. To address this, we propose Graph Flow Matching (GFM), a lightweight enhancement that decomposes the learned velocity into a reaction term -- any standard flow matching network -- and a diffusion term that aggregates neighbor information via a graph neural module. This reaction-diffusion formulation retains the scalability of deep flow models while enriching velocity predictions with local context, all at minimal additional computational cost. Operating in the latent space of a pretrained variational autoencoder, GFM consistently improves Fr\'echet Inception Distance (FID) and recall across five image generation benchmarks (LSUN Church, LSUN Bedroom, FFHQ, AFHQ-Cat, and CelebA-HQ at $256\times256$), demonstrating its effectiveness as a modular enhancement to existing flow matching architectures.
Related papers
- FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems [51.15230303652732]
FLEX (F Low EXpert) is a backbone architecture for generative modeling of-temporal physical systems.<n>It reduces the variance of the velocity field in the diffusion model, which helps stabilize training.<n>It achieves accurate predictions for super-resolution and forecasting tasks using as few features as two reverse diffusion steps.
arXiv Detail & Related papers (2025-05-23T00:07:59Z) - Deeply Supervised Flow-Based Generative Models [16.953166973699577]
DeepFlow is a novel framework that enhances velocity representation through inter layer communication.<n>Powered by the improved deep supervision via the internal velocity alignment, DeepFlow converges 8 times faster on ImageNet with equivalent performance.<n>DeepFlow also outperforms baselines in text to image generation tasks, as evidenced by evaluations on MSCOCO and zero shot GenEval.
arXiv Detail & Related papers (2025-03-18T17:58:08Z) - Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting [13.309018047313801]
Traffic forecasting has emerged as a crucial research area in the development of smart cities.
Recent advancements in network modeling for most-temporal correlations are starting to see diminishing returns in performance.
To tackle these challenges, we introduce the Spatio-Temporal Graph Transformer (STGormer)
We design two straightforward yet effective spatial encoding methods based on the structure and integrate time position into the vanilla transformer to capture-temporal traffic patterns.
arXiv Detail & Related papers (2024-08-20T13:18:21Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Wavelet-Inspired Multiscale Graph Convolutional Recurrent Network for
Traffic Forecasting [0.0]
We propose a Graph Conal Recurrent Network (WavGCRN) which combines multiscale analysis (MSA)-based method with Deep Learning (DL)-based method.
The proposed method can offer well-defined interpretability, powerful learning capability, and competitive forecasting performance on real-world traffic data sets.
arXiv Detail & Related papers (2024-01-11T16:55:48Z) - GAFlow: Incorporating Gaussian Attention into Optical Flow [62.646389181507764]
We push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning.
We introduce a novel Gaussian-Constrained Layer (GCL) which can be easily plugged into existing Transformer blocks.
For reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM)
arXiv Detail & Related papers (2023-09-28T07:46:01Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - Multiscale Graph Neural Network Autoencoders for Interpretable
Scientific Machine Learning [0.0]
The goal of this work is to address two limitations in autoencoder-based models: latent space interpretability and compatibility with unstructured meshes.
This is accomplished here with the development of a novel graph neural network (GNN) autoencoding architecture with demonstrations on complex fluid flow applications.
arXiv Detail & Related papers (2023-02-13T08:47:11Z) - Correlating sparse sensing for large-scale traffic speed estimation: A
Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging.
We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z) - Residual Graph Convolutional Recurrent Networks For Multi-step Traffic
Flow Forecasting [12.223433627287605]
We propose a new Spatial-temporal forecasting model, namely the Residual Graph Convolutional Recurrent Network (RGCRN)
The model uses our proposed Residual Graph Convolutional Network (ResGCN) to capture the fine-grained spatial correlation of the traffic road network.
Our comparative experimental results on two real datasets show that RGCRN improves on average by 20.66% compared to the best baseline model.
arXiv Detail & Related papers (2022-05-03T13:23:38Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.