Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images
- URL: http://arxiv.org/abs/2008.12165v2
- Date: Tue, 8 Dec 2020 23:43:12 GMT
- Title: Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images
- Authors: Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool
- Abstract summary: We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
- Score: 85.81073893916414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image features for retrieval-based localization must be invariant to dynamic
objects (e.g. cars) as well as seasonal and daytime changes. Such invariances
are, up to some extent, learnable with existing methods using triplet-like
losses, given a large number of diverse training images. However, due to the
high algorithmic training complexity, there exists insufficient comparison
between different loss functions on large datasets. In this paper, we train and
evaluate several localization methods on three different benchmark datasets,
including Oxford RobotCar with over one million images. This large scale
evaluation yields valuable insights into the generalizability and performance
of retrieval-based localization. Based on our findings, we develop a novel
method for learning more accurate and better generalizing localization
features. It consists of two main contributions: (i) a feature volume-based
loss function, and (ii) hard positive and pairwise negative mining. On the
challenging Oxford RobotCar night condition, our method outperforms the
well-known triplet loss by 24.4% in localization accuracy within 5m.
Related papers
- Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval [1.907072234794597]
We develop a framework for a universal feature extractor that provides strong semantic image representations across various domains.
We achieve near state-of-the-art results on the Google Universal Image Embedding Challenge, with a mMP@5 of 0.721.
Compared to methods with similar computational requirements, we outperform the previous state of the art by 3.3 percentage points.
arXiv Detail & Related papers (2024-09-20T13:53:13Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal
Retriveal [52.41252219453429]
Existing methods treat all instances equally, applying the same penalty strength to instances with varying degrees of difficulty.
This can result in ambiguous convergence or local optima, severely compromising the separability of the feature space.
We propose an Instance-Variant loss to assign different penalty strengths to different instances, improving the space separability.
arXiv Detail & Related papers (2023-05-07T10:12:14Z) - Improving Point Cloud Based Place Recognition with Ranking-based Loss
and Large Batch Training [1.116812194101501]
The paper presents a simple and effective learning-based method for computing a discriminative 3D point cloud descriptor.
We employ recent advances in image retrieval and propose a modified version of a loss function based on a differentiable average precision approximation.
arXiv Detail & Related papers (2022-03-02T09:29:28Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation [7.2531609092488445]
We develop a convolutional neural network architecture which aggregates feature maps at different semantic levels for image representations.
Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs.
Third, our method is computationally efficient as our architecture has shared features and parameters during computation.
arXiv Detail & Related papers (2020-12-04T02:16:47Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.