Related papers: UFM: A Simple Path towards Unified Dense Correspondence with Flow

UFM: A Simple Path towards Unified Dense Correspondence with Flow

URL: http://arxiv.org/abs/2506.09278v1
Date: Tue, 10 Jun 2025 22:32:13 GMT
Title: UFM: A Simple Path towards Unified Dense Correspondence with Flow
Authors: Yuchen Zhang, Nikhil Keetha, Chenwei Lyu, Bhuvan Jhamb, Yutian Chen, Yuheng Qiu, Jay Karhade, Shreyas Jha, Yaoyu Hu, Deva Ramanan, Sebastian Scherer, Wenshan Wang,
Abstract summary: Unified Flow & Matching model (UFM) is trained on unified data for pixels that are co-visible in both source and target images.<n>UFM is 28% more accurate than state-of-the-art flow methods.
Score: 40.97394594672024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.

Related papers

Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation [1.1847464266302488]
Flow Matching (FM) is a powerful generative modeling paradigm based on a simulation-free training objective instead of a score-based one used in DMs.<n>We present Learned Distribution-guided Flow Matching (LeDiFlow), a novel scalable method for training FM-based image generation models.<n>Our method utilizes a State-Of-The-Art (SOTA) transformer architecture combined with latent space sampling and can be trained on a consumer workstation.
arXiv Detail & Related papers (2025-05-27T05:07:37Z)
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution [14.57591222028278]
We present a general and simple framework, CrossFlow, for cross-modal flow matching.<n>We show the importance of applying Variationals to the input data, and introduce a method to enable-free guidance.<n>To demonstrate the generalizability of our approach, we also show that CrossFlow is on par with or outperforms the state-of-the-art for various cross-modal / intramodal mapping tasks.
arXiv Detail & Related papers (2024-12-19T18:59:56Z)
FLD+: Data-efficient Evaluation Metric for Generative Models [4.093503153499691]
We introduce a new metric to assess the quality of generated images that is more reliable, data-efficient, compute-efficient, and adaptable to new domains. The proposed metric is based on normalizing flows, which allows for the computation of density (exact log-likelihood) of images from any domain.
arXiv Detail & Related papers (2024-11-23T15:12:57Z)
SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation [2.336821026049481]
We propose SRIF, a novel Semantic shape Registration framework based on diffusion-based Image morphing and flow estimation. SRIF achieves high-quality dense correspondences on challenging shape pairs, but also delivers smooth, semantically meaningful in between.
arXiv Detail & Related papers (2024-09-18T03:47:24Z)
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency [97.28511135503176]
We introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models.
arXiv Detail & Related papers (2024-07-02T16:15:37Z)
Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance [66.4153984834872]
We propose Straighter trajectories of Flow Matching (StraightFM) It straightens trajectories with the coupling strategy guided by diffusion model from entire distribution level. It generates visually appealing images with a lower FID among diffusion and traditional flow matching methods.
arXiv Detail & Related papers (2023-11-28T06:19:30Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
Rethinking Coarse-to-Fine Approach in Single Image Deblurring [19.195704769925925]
We present a fast and accurate deblurring network design using a multi-input multi-output U-net. The proposed network outperforms the state-of-the-art methods in terms of both accuracy and computational complexity.
arXiv Detail & Related papers (2021-08-11T06:37:01Z)
Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification [114.56752624945142]
We argue that the most popular random sampling method, the well-known PK sampler, is not informative and efficient for deep metric learning. We propose an efficient mini batch sampling method called Graph Sampling (GS) for large-scale metric learning.
arXiv Detail & Related papers (2021-04-04T06:44:15Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.