Self-Supervised Learning of Image Scale and Orientation
- URL: http://arxiv.org/abs/2206.07259v1
- Date: Wed, 15 Jun 2022 02:43:39 GMT
- Title: Self-Supervised Learning of Image Scale and Orientation
- Authors: Jongmin Lee, Yoonwoo Jeong, Minsu Cho
- Abstract summary: We study the problem of learning to assign a characteristic pose, i.e., scale and orientation, for an image region of interest.
It is hard to obtain a large-scale set of image regions with explicit pose annotations that a model directly learns from.
We propose a self-supervised learning framework with a histogram alignment technique.
- Score: 35.94215211409985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of learning to assign a characteristic pose, i.e., scale
and orientation, for an image region of interest. Despite its apparent
simplicity, the problem is non-trivial; it is hard to obtain a large-scale set
of image regions with explicit pose annotations that a model directly learns
from. To tackle the issue, we propose a self-supervised learning framework with
a histogram alignment technique. It generates pairs of image patches by random
rescaling/rotating and then train an estimator to predict their
scale/orientation values so that their relative difference is consistent with
the rescaling/rotating used. The estimator learns to predict a non-parametric
histogram distribution of scale/orientation without any supervision.
Experiments show that it significantly outperforms previous methods in
scale/orientation estimation and also improves image matching and 6 DoF camera
pose estimation by incorporating our patch poses into a matching process.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - Learning to Rank Patches for Unbiased Image Redundancy Reduction [80.93989115541966]
Images suffer from heavy spatial redundancy because pixels in neighboring regions are spatially correlated.
Existing approaches strive to overcome this limitation by reducing less meaningful image regions.
We propose a self-supervised framework for image redundancy reduction called Learning to Rank Patches.
arXiv Detail & Related papers (2024-03-31T13:12:41Z) - FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision.
We show how to combine the best of both methods; our approach yields results that are both precise and robust.
A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z) - Self-similarity Driven Scale-invariant Learning for Weakly Supervised
Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL)
We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features.
Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z) - Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression [13.233301155616616]
This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images.
Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression.
We evaluate our method on several datasets and show that it outperforms previous end-to-end methods.
arXiv Detail & Related papers (2022-11-27T22:01:47Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Scale-Net: Learning to Reduce Scale Differences for Large-Scale
Invariant Image Matching [7.297352404640492]
We propose a scale-difference-aware image matching method (SDAIM) that reduces image scale differences before local feature extraction.
In order to accurately estimate the scale ratio, we propose a covisibility-attention-reinforced matching module (CVARM) and then design a novel neural network, termed as Scale-Net.
arXiv Detail & Related papers (2021-12-20T12:35:36Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.