Semantic Pose Verification for Outdoor Visual Localization with
Self-supervised Contrastive Learning
- URL: http://arxiv.org/abs/2203.16945v1
- Date: Thu, 31 Mar 2022 11:09:38 GMT
- Title: Semantic Pose Verification for Outdoor Visual Localization with
Self-supervised Contrastive Learning
- Authors: Semih Orhan, Jose J. Guerrero, Yalin Bastanlar
- Abstract summary: We exploit semantic content to improve visual localization.
In our scenario, the database consists of gnomonic views generated from panoramic images.
We train a CNN in a self-supervised fashion with contrastive learning on a dataset of semantically segmented images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Any city-scale visual localization system has to overcome long-term
appearance changes, such as varying illumination conditions or seasonal changes
between query and database images. Since semantic content is more robust to
such changes, we exploit semantic information to improve visual localization.
In our scenario, the database consists of gnomonic views generated from
panoramic images (e.g. Google Street View) and query images are collected with
a standard field-of-view camera at a different time. To improve localization,
we check the semantic similarity between query and database images, which is
not trivial since the position and viewpoint of the cameras do not exactly
match. To learn similarity, we propose training a CNN in a self-supervised
fashion with contrastive learning on a dataset of semantically segmented
images. With experiments we showed that this semantic similarity estimation
approach works better than measuring the similarity at pixel-level. Finally, we
used the semantic similarity scores to verify the retrievals obtained by a
state-of-the-art visual localization method and observed that contrastive
learning-based pose verification increases top-1 recall value to 0.90 which
corresponds to a 2% improvement.
Related papers
- Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models [21.17975741743583]
It has recently been discovered that using a pre-trained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions can significantly enhance zero-shot performance.
In this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than the whole image.
arXiv Detail & Related papers (2024-06-05T04:08:41Z) - Investigating the Role of Image Retrieval for Visual Localization -- An
exhaustive benchmark [46.166955777187816]
This paper focuses on understanding the role of image retrieval for multiple visual localization paradigms.
We introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets.
Using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance.
arXiv Detail & Related papers (2022-05-31T12:59:01Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Adversarial Transfer of Pose Estimation Regression [11.117357750374035]
We develop a deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate representations for model transfer.
We evaluate our network on two public datasets, Cambridge Landmarks and 7Scene, demonstrate its superiority over several baselines and compare to the state of the art methods.
arXiv Detail & Related papers (2020-06-20T21:16:37Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - Adaptive Semantic-Visual Tree for Hierarchical Embeddings [67.01307058209709]
We propose a hierarchical adaptive semantic-visual tree to depict the architecture of merchandise categories.
The tree evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously.
At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding.
arXiv Detail & Related papers (2020-03-08T03:36:42Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.