Generalized Contrastive Optimization of Siamese Networks for Place
Recognition
- URL: http://arxiv.org/abs/2103.06638v4
- Date: Thu, 20 Apr 2023 09:24:25 GMT
- Title: Generalized Contrastive Optimization of Siamese Networks for Place
Recognition
- Authors: Mar\'ia Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov
- Abstract summary: We propose a Generalized Contrastive loss function that relies on image similarity as a continuous measure, and use it to train a siamese CNN.
We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts.
Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, NetVLAD-SARE, AP-GeM and Patch-NetVLAD, and generalize well on the Pittsburgh30k, Tokyo 24/7, RobotCar Seasons v2 and Extended CMU Seasons datasets.
- Score: 10.117451511942267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual place recognition is a challenging task in computer vision and a key
component of camera-based localization and navigation systems. Recently,
Convolutional Neural Networks (CNNs) achieved high results and good
generalization capabilities. They are usually trained using pairs or triplets
of images labeled as either similar or dissimilar, in a binary fashion. In
practice, the similarity between two images is not binary, but continuous.
Furthermore, training these CNNs is computationally complex and involves costly
pair and triplet mining strategies. We propose a Generalized Contrastive loss
(GCL) function that relies on image similarity as a continuous measure, and use
it to train a siamese CNN. Furthermore, we present three techniques for
automatic annotation of image pairs with labels indicating their degree of
similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes
datasets. We demonstrate that siamese CNNs trained using the GCL function and
the improved annotations consistently outperform their binary counterparts. Our
models trained on MSLS outperform the state-of-the-art methods, including
NetVLAD, NetVLAD-SARE, AP-GeM and Patch-NetVLAD, and generalize well on the
Pittsburgh30k, Tokyo 24/7, RobotCar Seasons v2 and Extended CMU Seasons
datasets. Furthermore, training a siamese network using the GCL function does
not require complex pair mining. We release the source code at
https://github.com/marialeyvallina/generalized_contrastive_loss.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Extracting Semantic Knowledge from GANs with Unsupervised Learning [65.32631025780631]
Generative Adversarial Networks (GANs) encode semantics in feature maps in a linearly separable form.
We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features.
KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects.
arXiv Detail & Related papers (2022-11-30T03:18:16Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Gate-Shift-Fuse for Video Action Recognition [43.8525418821458]
Gate-Fuse (GSF) is a novel-temporal feature extraction module which controls interactions in-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner.
GSF can be inserted into existing 2D CNNs to convert them into efficient and high performing, with negligible parameter and compute overhead.
We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.
arXiv Detail & Related papers (2022-03-16T19:19:04Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Do End-to-end Stereo Algorithms Under-utilize Information? [7.538482310185133]
We show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in 2D and 3D convolutional networks for end-to-end stereo matching.
The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process.
arXiv Detail & Related papers (2020-10-14T18:32:39Z) - Pairwise Relation Learning for Semi-supervised Gland Segmentation [90.45303394358493]
We propose a pairwise relation-based semi-supervised (PRS2) model for gland segmentation on histology images.
This model consists of a segmentation network (S-Net) and a pairwise relation network (PR-Net)
We evaluate our model against five recent methods on the GlaS dataset and three recent methods on the CRAG dataset.
arXiv Detail & Related papers (2020-08-06T15:02:38Z) - Reinforcement Learning Based Handwritten Digit Recognition with
Two-State Q-Learning [1.8782750537161614]
We present a Hybrid approach based on Deep Learning and Reinforcement Learning.
Q-Learning is used with two Q-states and four actions.
Our approach outperforms other contemporary techniques like AlexNet, CNN-Nearest Neighbor and CNNSupport Vector Machine.
arXiv Detail & Related papers (2020-06-28T14:23:36Z) - On the Texture Bias for Few-Shot CNN Segmentation [21.349705243254423]
Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks.
Recent evidence suggests texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
We propose a novel architecture that integrates a set of Difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space.
arXiv Detail & Related papers (2020-03-09T11:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.