Related papers: CodingHomo: Bootstrapping Deep Homography With Video Coding

CodingHomo: Bootstrapping Deep Homography With Video Coding

URL: http://arxiv.org/abs/2504.12165v1
Date: Wed, 16 Apr 2025 15:18:11 GMT
Title: CodingHomo: Bootstrapping Deep Homography With Video Coding
Authors: Yike Liu, Haipeng Li, Shuaicheng Liu, Bing Zeng,
Abstract summary: Homography estimation is a fundamental task in computer vision with applications in diverse fields.<n>Recent advances in deep learning have improved homography estimation, particularly with unsupervised learning approaches.<n>We present CodingHomo, an unsupervised framework for homography estimation.
Score: 49.69268313796418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Homography estimation is a fundamental task in computer vision with applications in diverse fields. Recent advances in deep learning have improved homography estimation, particularly with unsupervised learning approaches, offering increased robustness and generalizability. However, accurately predicting homography, especially in complex motions, remains a challenge. In response, this work introduces a novel method leveraging video coding, particularly by harnessing inherent motion vectors (MVs) present in videos. We present CodingHomo, an unsupervised framework for homography estimation. Our framework features a Mask-Guided Fusion (MGF) module that identifies and utilizes beneficial features among the MVs, thereby enhancing the accuracy of homography prediction. Additionally, the Mask-Guided Homography Estimation (MGHE) module is presented for eliminating undesired features in the coarse-to-fine homography refinement process. CodingHomo outperforms existing state-of-the-art unsupervised methods, delivering good robustness and generalizability. The code and dataset are available at: \href{github}{https://github.com/liuyike422/CodingHomo

Related papers

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation [53.01486796503091]
We present emphHarmon, a unified autoregressive framework that harmonizes understanding and generation tasks with a shared MAR encoder.<n>Harmon achieves state-of-the-art image generation results on the GenEval, MJHQ30K and WISE benchmarks.
arXiv Detail & Related papers (2025-03-27T20:50:38Z)
Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration [0.0]
A novel Bayesian framework is proposed, which explicitly relates the homography of one video frame to the next through an affine transformation. The proposed method, Bayesian Homography Inference from Tracked Keypoints (BHITK), employs a two-stage Kalman filter and significantly improves existing methods.
arXiv Detail & Related papers (2023-11-17T07:30:00Z)
Domain Generalization for Mammographic Image Analysis with Contrastive Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities. A novel contrastive learning is developed to equip the deep learning models with better style generalization capability. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z)
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds. We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z)
GraphMAE: Self-Supervised Masked Graph Autoencoders [52.06140191214428]
We present a masked graph autoencoder GraphMAE that mitigates issues for generative self-supervised graph learning. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE--a simple graph autoencoder with our careful designs--can consistently generate outperformance over both contrastive and generative state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-22T11:57:08Z)
Unsupervised Homography Estimation with Coplanarity-Aware GAN [39.477228263736905]
Estimating homography from an image pair is a fundamental problem in image alignment. HomoGAN is designed to guide unsupervised homography estimation to focus on the dominant plane. Results show that our matching error is 22% lower than the previous SOTA method.
arXiv Detail & Related papers (2022-05-08T09:26:47Z)
Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO) The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes. Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z)
Bayesian Deep Learning for Graphs [6.497816402045099]
dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks.
arXiv Detail & Related papers (2022-02-24T20:18:41Z)
Depth-Aware Multi-Grid Deep Homography Estimation with Contextual Correlation [38.95610086309832]
Homography estimation is an important task in computer vision, such as image stitching, video stabilization, and camera calibration. Traditional homography estimation methods depend on the quantity and distribution of feature points, leading to poor robustness in textureless scenes. We propose a contextual correlation layer, which can capture the long-range correlation on feature maps and flexibly be bridged in a learning framework. We equip our network with depth perception capability, by introducing a novel depth-aware shape-preserved loss.
arXiv Detail & Related papers (2021-07-06T10:33:12Z)
Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision. We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities. 90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.