Related papers: HomoFM: Deep Homography Estimation with Flow Matching

HomoFM: Deep Homography Estimation with Flow Matching

URL: http://arxiv.org/abs/2601.18222v1
Date: Mon, 26 Jan 2026 07:17:32 GMT
Title: HomoFM: Deep Homography Estimation with Flow Matching
Authors: Mengfan He, Liangzheng Sun, Chunyu Li, Ziyang Meng,
Abstract summary: HomoFM is a new framework that introduces the flow matching technique from generative modeling into the homography estimation task.<n>We show that HomoFM outperforms state-of-the-art methods in both estimation accuracy and robustness on standard benchmarks.
Score: 2.0260360833154913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep homography estimation has broad applications in computer vision and robotics. Remarkable progresses have been achieved while the existing methods typically treat it as a direct regression or iterative refinement problem and often struggling to capture complex geometric transformations or generalize across different domains. In this work, we propose HomoFM, a new framework that introduces the flow matching technique from generative modeling into the homography estimation task for the first time. Unlike the existing methods, we formulate homography estimation problem as a velocity field learning problem. By modeling a continuous and point-wise velocity field that transforms noisy distributions into registered coordinates, the proposed network recovers high-precision transformations through a conditional flow trajectory. Furthermore, to address the challenge of domain shifts issue, e.g., the cases of multimodal matching or varying illumination scenarios, we integrate a gradient reversal layer (GRL) into the feature extraction backbone. This domain adaptation strategy explicitly constrains the encoder to learn domain-invariant representations, significantly enhancing the network's robustness. Extensive experiments demonstrate the effectiveness of the proposed method, showing that HomoFM outperforms state-of-the-art methods in both estimation accuracy and robustness on standard benchmarks. Code and data resource are available at https://github.com/hmf21/HomoFM.

Related papers

Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection [58.25418970608328]
Single-Domain Generalized Object Detection (Single-DGOD) aims to transfer a detector trained on one source domain to multiple unknown domains.<n>Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity.<n>We propose a new method, which simulates the progressive evolution of features from the source domain to simulated latent distributions.
arXiv Detail & Related papers (2025-11-13T03:10:39Z)
Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
PreAdaptFWI: Pretrained-Based Adaptive Residual Learning for Full-Waveform Inversion Without Dataset Dependency [8.719356558714246]
Full-waveform inversion (FWI) is a method that utilizes seismic data to invert the physical parameters of subsurface media.<n>Due to its ill-posed nature, FWI is susceptible to getting trapped in local minima.<n>Various research efforts have attempted to combine neural networks with FWI to stabilize the inversion process.
arXiv Detail & Related papers (2025-02-17T15:30:17Z)
MAP-based Problem-Agnostic diffusion model for Inverse Problems [8.161067848524976]
We propose a problem-agnostic diffusion model called the maximum a posteriori (MAP)-based guided term estimation method for inverse problems.<n>This innovation allows us to better capture the intrinsic properties of the data, leading to improved performance.
arXiv Detail & Related papers (2025-01-25T08:30:15Z)
Boundless Across Domains: A New Paradigm of Adaptive Feature and Cross-Attention for Domain Generalization in Medical Image Segmentation [1.93061220186624]
Domain-invariant representation learning is a powerful method for domain generalization. Previous approaches face challenges such as high computational demands, training instability, and limited effectiveness with high-dimensional data. We propose an Adaptive Feature Blending (AFB) method that generates out-of-distribution samples while exploring the in-distribution space.
arXiv Detail & Related papers (2024-11-22T12:06:24Z)
AbHE: All Attention-based Homography Estimation [0.0]
We propose a strong-baseline model based on the Swin Transformer, which combines convolution neural network for local features and transformer module for global features. In the homography regression stage, we adopt an attention layer for the channels of correlation volume, which can drop out some weak correlation feature points. The experiment shows that in 8 Degree-of-Freedoms(DOFs) homography estimation our method overperforms the state-of-the-art method.
arXiv Detail & Related papers (2022-12-06T15:00:00Z)
Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images. Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z)
Homography Decomposition Networks for Planar Object Tracking [11.558401177707312]
Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM. We propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups.
arXiv Detail & Related papers (2021-12-15T06:13:32Z)
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms. Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z)
Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint. It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z)
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography. Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges. We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.