Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization
- URL: http://arxiv.org/abs/2411.13036v1
- Date: Wed, 20 Nov 2024 04:56:19 GMT
- Title: Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization
- Authors: Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon,
- Abstract summary: Estimating homography between two images is crucial for mid- or high-level vision tasks.
We propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs.
- Score: 32.78378595686787
- License:
- Abstract: Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs. To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators. The source code can be found at:~\url{https://github.com/songsang7/AltO}
Related papers
- Learning from small data sets: Patch-based regularizers in inverse
problems for image reconstruction [1.1650821883155187]
Recent advances in machine learning require a huge amount of data and computer capacity to train the networks.
Our paper addresses the issue of learning from small data sets by taking patches of very few images into account.
We show how we can achieve uncertainty quantification by approximating the posterior using Langevin Monte Carlo methods.
arXiv Detail & Related papers (2023-12-27T15:30:05Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Parallax-Tolerant Unsupervised Deep Image Stitching [57.76737888499145]
We propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique.
First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion.
To further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks.
arXiv Detail & Related papers (2023-02-16T10:40:55Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Manifold-Inspired Single Image Interpolation [17.304301226838614]
Many approaches to single image use manifold models to exploit semi-local similarity.
aliasing in the input image makes it challenging for both parts.
We propose a carefully-designed adaptive technique to remove aliasing in severely aliased regions.
This technique enables reliable identification of similar patches even in the presence of strong aliasing.
arXiv Detail & Related papers (2021-07-31T04:29:05Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Dual Contrastive Learning for Unsupervised Image-to-Image Translation [16.759958400617947]
Unsupervised image-to-image translation tasks aim to find a mapping between a source domain X and a target domain Y from unpaired training data.
Contrastive learning for Unpaired image-to-image Translation yields state-of-the-art results.
We propose a novel method based on contrastive learning and a dual learning setting to infer an efficient mapping between unpaired data.
arXiv Detail & Related papers (2021-04-15T18:00:22Z) - Multi-temporal and multi-source remote sensing image classification by
nonlinear relative normalization [17.124438150480326]
We study a methodology that aligns data from different domains in a nonlinear way through em kernelization.
We successfully test KEMA in multi-temporal and multi-source very high resolution classification tasks, as well as on the task of making a model invariant to shadowing for hyperspectral imaging.
arXiv Detail & Related papers (2020-12-07T08:46:11Z) - An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human
Pose Estimation [80.02124918255059]
Semi-supervised learning aims to boost the accuracy of a model by exploring unlabeled images.
We learn two networks to mutually teach each other.
The more reliable predictions on easy images in each network are used to teach the other network to learn about the corresponding hard images.
arXiv Detail & Related papers (2020-11-25T03:29:52Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.