Related papers: Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

URL: http://arxiv.org/abs/2003.13431v1
Date: Mon, 30 Mar 2020 12:56:44 GMT
Title: Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance
Authors: Jaime Spencer, Richard Bowden, Simon Hadfield
Abstract summary: "Like night and day" is a commonly used expression to imply that two things are completely different. The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval. We propose Deja-Vu, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data.
Score: 65.94499390875046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: "Like night and day" is a commonly used expression to imply that two things are completely different. Unfortunately, this tends to be the case for current visual feature representations of the same scene across varying seasons or times of day. The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval, regardless of the current seasonal or temporal appearance. Recently, there have been several proposed methodologies for deep learning dense feature representations. These methods make use of ground truth pixel-wise correspondences between pairs of images and focus on the spatial properties of the features. As such, they don't address temporal or seasonal variation. Furthermore, obtaining the required pixel-wise correspondence data to train in cross-seasonal environments is highly complex in most scenarios. We propose Deja-Vu, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data. The proposed system only requires coarse labels indicating if two images correspond to the same location or not. From these labels, the network is trained to produce "similar" dense feature maps for corresponding locations despite environmental changes. Code will be made available at: https://github.com/jspenmar/DejaVu_Features

Related papers

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence [12.602194710071116]
This paper presents a new method that uses semantic cues from foundation vision model features to enhance local feature matching. We present adapted versions of six existing descriptors, with an average increase in performance of 29% in camera localization.
arXiv Detail & Related papers (2024-10-12T13:45:26Z)
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images. Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features. We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z)
Context-aware Difference Distilling for Multi-change Captioning [106.72151597074098]
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. We propose a novel context-aware difference distilling network to capture all genuine changes for yielding sentences.
arXiv Detail & Related papers (2024-05-31T14:07:39Z)
Semantic Pose Verification for Outdoor Visual Localization with Self-supervised Contrastive Learning [0.0]
We exploit semantic content to improve visual localization. In our scenario, the database consists of gnomonic views generated from panoramic images. We train a CNN in a self-supervised fashion with contrastive learning on a dataset of semantically segmented images.
arXiv Detail & Related papers (2022-03-31T11:09:38Z)
Spatially Multi-conditional Image Generation [80.04130168156792]
We propose a novel neural architecture to address the problem of multi-conditional image generation. The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens. Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and the compared baselines.
arXiv Detail & Related papers (2022-03-25T17:57:13Z)
Sparse Spatial Transformers for Few-Shot Learning [6.271261279657655]
Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model. We propose a novel transformer-based neural network architecture called sparse spatial transformers. Our method finds task-relevant features and suppresses task-irrelevant features.
arXiv Detail & Related papers (2021-09-27T10:36:32Z)
i3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent Environmental Conditions [9.982307144353713]
We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes. Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors. With a single trained model, i3dLoc can demonstrate reliable visual localization in random conditions.
arXiv Detail & Related papers (2021-05-27T00:13:11Z)
Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks. ReSim learns both regional representations for localization as well as semantic image-level representations. We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z)
Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems. We present three novel scenarios for localization and mapping which require the continuous update of feature representations. Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
Unsupervised Learning of Dense Visual Representations [14.329781842154281]
We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
arXiv Detail & Related papers (2020-11-11T01:28:11Z)
Continual Local Replacement for Few-shot Learning [13.956960291580938]
The goal of few-shot learning is to learn a model that can recognize novel classes based on one or few training data. It is challenging mainly due to two aspects: (1) it lacks good feature representation of novel classes; (2) a few of labeled data could not accurately represent the true data distribution. A novel continual local replacement strategy is proposed to address the data deficiency problem.
arXiv Detail & Related papers (2020-01-23T04:26:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.