Same Features, Different Day: Weakly Supervised Feature Learning for
Seasonal Invariance
- URL: http://arxiv.org/abs/2003.13431v1
- Date: Mon, 30 Mar 2020 12:56:44 GMT
- Title: Same Features, Different Day: Weakly Supervised Feature Learning for
Seasonal Invariance
- Authors: Jaime Spencer, Richard Bowden, Simon Hadfield
- Abstract summary: "Like night and day" is a commonly used expression to imply that two things are completely different.
The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval.
We propose Deja-Vu, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data.
- Score: 65.94499390875046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: "Like night and day" is a commonly used expression to imply that two things
are completely different. Unfortunately, this tends to be the case for current
visual feature representations of the same scene across varying seasons or
times of day. The aim of this paper is to provide a dense feature
representation that can be used to perform localization, sparse matching or
image retrieval, regardless of the current seasonal or temporal appearance.
Recently, there have been several proposed methodologies for deep learning
dense feature representations. These methods make use of ground truth
pixel-wise correspondences between pairs of images and focus on the spatial
properties of the features. As such, they don't address temporal or seasonal
variation. Furthermore, obtaining the required pixel-wise correspondence data
to train in cross-seasonal environments is highly complex in most scenarios.
We propose Deja-Vu, a weakly supervised approach to learning season invariant
features that does not require pixel-wise ground truth data. The proposed
system only requires coarse labels indicating if two images correspond to the
same location or not. From these labels, the network is trained to produce
"similar" dense feature maps for corresponding locations despite environmental
changes. Code will be made available at:
https://github.com/jspenmar/DejaVu_Features
Related papers
- Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence [12.602194710071116]
This paper presents a new method that uses semantic cues from foundation vision model features to enhance local feature matching.
We present adapted versions of six existing descriptors, with an average increase in performance of 29% in camera localization.
arXiv Detail & Related papers (2024-10-12T13:45:26Z) - Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images.
Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features.
We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z) - Context-aware Difference Distilling for Multi-change Captioning [106.72151597074098]
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language.
We propose a novel context-aware difference distilling network to capture all genuine changes for yielding sentences.
arXiv Detail & Related papers (2024-05-31T14:07:39Z) - Semantic Pose Verification for Outdoor Visual Localization with
Self-supervised Contrastive Learning [0.0]
We exploit semantic content to improve visual localization.
In our scenario, the database consists of gnomonic views generated from panoramic images.
We train a CNN in a self-supervised fashion with contrastive learning on a dataset of semantically segmented images.
arXiv Detail & Related papers (2022-03-31T11:09:38Z) - Spatially Multi-conditional Image Generation [80.04130168156792]
We propose a novel neural architecture to address the problem of multi-conditional image generation.
The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens.
Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and the compared baselines.
arXiv Detail & Related papers (2022-03-25T17:57:13Z) - Sparse Spatial Transformers for Few-Shot Learning [6.271261279657655]
Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model.
We propose a novel transformer-based neural network architecture called sparse spatial transformers.
Our method finds task-relevant features and suppresses task-irrelevant features.
arXiv Detail & Related papers (2021-09-27T10:36:32Z) - i3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent
Environmental Conditions [9.982307144353713]
We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes.
Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors.
With a single trained model, i3dLoc can demonstrate reliable visual localization in random conditions.
arXiv Detail & Related papers (2021-05-27T00:13:11Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Unsupervised Learning of Dense Visual Representations [14.329781842154281]
We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.
VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions.
Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
arXiv Detail & Related papers (2020-11-11T01:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.