Towards Generalized Multimodal Homography Estimation
- URL: http://arxiv.org/abs/2603.03956v1
- Date: Wed, 04 Mar 2026 11:35:56 GMT
- Title: Towards Generalized Multimodal Homography Estimation
- Authors: Jinkun You, Jiaxin Cheng, Jie Zhang, Yicong Zhou,
- Abstract summary: Supervised and unsupervised homography estimation methods depend on image pairs tailored to specific modalities to achieve high accuracy.<n>We propose a training data synthesis method that generates unaligned image pairs with ground-truth offsets from a single input image.<n>Our approach renders the image pairs with diverse textures and colors while preserving their structural information.
- Score: 43.13726458321087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised and unsupervised homography estimation methods depend on image pairs tailored to specific modalities to achieve high accuracy. However, their performance deteriorates substantially when applied to unseen modalities. To address this issue, we propose a training data synthesis method that generates unaligned image pairs with ground-truth offsets from a single input image. Our approach renders the image pairs with diverse textures and colors while preserving their structural information. These synthetic data empower the trained model to achieve greater robustness and improved generalization across various domains. Additionally, we design a network to fully leverage cross-scale information and decouple color information from feature representations, thus improving estimation accuracy. Extensive experiments show that our training data synthesis method improves generalization performance. The results also confirm the effectiveness of the proposed network.
Related papers
- Infusing fine-grained visual knowledge to Vision-Language Models [5.487134463783365]
Large-scale contrastive pre-training produces powerful Vision-and-Language Models (VLMs)<n>We propose a fine-tuning method explicitly designed to achieve optimal balance between fine-grained domain adaptation and retention of the pretrained VLM's broad multimodal knowledge.<n>Our approach consistently achieves strong results, notably retaining the visual-text alignment without utilizing any text data or the original text encoder during fine-tuning.
arXiv Detail & Related papers (2025-08-16T19:12:09Z) - Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World [8.56549004133167]
Stereo matching methods rely on dense pixel-wise ground truth labels.<n>The scarcity of labeled data and domain gaps between synthetic and real-world images pose notable challenges.<n>We propose a novel framework, textbfBooSTer, that leverages both vision foundation models and large-scale mixed image sources.
arXiv Detail & Related papers (2025-05-13T14:24:38Z) - Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data [27.27230943686822]
We propose an adversarial semantic augmentation (ASA) technique to enlarge the training data at the semantic level instead of the image level.<n>Our method consistently improve the synthesis quality under various data regimes.
arXiv Detail & Related papers (2025-02-02T13:50:38Z) - GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing [60.101097709212716]
This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach.<n>Our technique leverages image editing to generate augmented images based on custom conditional prompts.<n>Our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models.
arXiv Detail & Related papers (2024-12-03T10:45:34Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models.<n>Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.