DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation
- URL: http://arxiv.org/abs/2012.02366v4
- Date: Thu, 11 Mar 2021 21:06:01 GMT
- Title: DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation
- Authors: Dongfang Liu, Yiming Cui, Liqi Yan, Christos Mousas, Baijian Yang,
Yingjie Chen
- Abstract summary: We develop a convolutional neural network architecture which aggregates feature maps at different semantic levels for image representations.
Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs.
Third, our method is computationally efficient as our architecture has shared features and parameters during computation.
- Score: 7.2531609092488445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce a Denser Feature Network (DenserNet) for visual
localization. Our work provides three principal contributions. First, we
develop a convolutional neural network (CNN) architecture which aggregates
feature maps at different semantic levels for image representations. Using
denser feature maps, our method can produce more keypoint features and increase
image retrieval accuracy. Second, our model is trained end-to-end without
pixel-level annotation other than positive and negative GPS-tagged image pairs.
We use a weakly supervised triplet ranking loss to learn discriminative
features and encourage keypoint feature repeatability for image representation.
Finally, our method is computationally efficient as our architecture has shared
features and parameters during computation. Our method can perform accurate
large-scale localization under challenging conditions while remaining the
computational constraint. Extensive experiment results indicate that our method
sets a new state-of-the-art on four challenging large-scale localization
benchmarks and three image retrieval benchmarks.
Related papers
- Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping [15.085191496726967]
We introduce Decomposition-based Neural Mapping (DNMap)
DNMap is a storage-efficient large-scale 3D mapping method.
We learn low-resolution continuous embeddings that require tiny storage space.
arXiv Detail & Related papers (2024-07-22T11:32:33Z) - A Triplet-loss Dilated Residual Network for High-Resolution
Representation Learning in Image Retrieval [0.0]
In some applications, such as localization, image retrieval is employed as the initial step.
The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters.
The proposed method benefits from a dilated residual convolutional neural network with triplet loss.
arXiv Detail & Related papers (2023-03-15T07:01:44Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Learning Super-Features for Image Retrieval [34.22539650643026]
We propose a novel architecture for deep image retrieval based solely on mid-level features that we call Super-features.
Experiments on common landmark retrieval benchmarks validate that Super-features substantially outperform state-of-the-art methods when using the same number of features.
arXiv Detail & Related papers (2022-01-31T12:48:42Z) - GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps.
A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure.
Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - KiU-Net: Overcomplete Convolutional Architectures for Biomedical Image
and Volumetric Segmentation [71.79090083883403]
"Traditional" encoder-decoder based approaches perform poorly in detecting smaller structures and are unable to segment boundary regions precisely.
We propose KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features.
The proposed method achieves a better performance as compared to all the recent methods with an additional benefit of fewer parameters and faster convergence.
arXiv Detail & Related papers (2020-10-04T19:23:33Z) - Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images [85.81073893916414]
We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
arXiv Detail & Related papers (2020-08-27T14:46:22Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.