Deep SIMBAD: Active Landmark-based Self-localization Using Ranking
-based Scene Descriptor
- URL: http://arxiv.org/abs/2109.02786v1
- Date: Mon, 6 Sep 2021 23:51:27 GMT
- Title: Deep SIMBAD: Active Landmark-based Self-localization Using Ranking
-based Scene Descriptor
- Authors: Tanaka Kanji
- Abstract summary: We consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner.
Experiments using the public NCLT dataset validated the effectiveness of the proposed approach.
- Score: 5.482532589225552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Landmark-based robot self-localization has recently garnered interest as a
highly-compressive domain-invariant approach for performing visual place
recognition (VPR) across domains (e.g., time of day, weather, and season).
However, landmark-based self-localization can be an ill-posed problem for a
passive observer (e.g., manual robot control), as many viewpoints may not
provide an effective landmark view. In this study, we consider an active
self-localization task by an active observer and present a novel reinforcement
learning (RL)-based next-best-view (NBV) planner. Our contributions are as
follows. (1) SIMBAD-based VPR: We formulate the problem of landmark-based
compact scene description as SIMBAD (similarity-based pattern recognition) and
further present its deep learning extension. (2) VPR-to-NBV knowledge transfer:
We address the challenge of RL under uncertainty (i.e., active
self-localization) by transferring the state recognition ability of VPR to the
NBV. (3) NNQL-based NBV: We regard the available VPR as the experience database
by adapting nearest-neighbor approximation of Q-learning (NNQL). The result
shows an extremely compact data structure that compresses both the VPR and NBV
into a single incremental inverted index. Experiments using the public NCLT
dataset validated the effectiveness of the proposed approach.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition.
We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR.
We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition [10.39935021754015]
We develop OverlapMamba, a novel network for place recognition as sequences.
Our method effectively detects loop closures showing even when traversing previously visited locations from different directions.
Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed.
arXiv Detail & Related papers (2024-05-13T17:46:35Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - CPR++: Object Localization via Single Coarse Point Supervision [55.8671776333499]
coarse point refinement (CPR) is first attempt to alleviate semantic variance from an algorithmic perspective.
CPR reduces semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
CPR++ can obtain scale information and further reduce the semantic variance in a global region.
arXiv Detail & Related papers (2024-01-30T17:38:48Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Self-Supervised Visual Place Recognition by Mining Temporal and Feature
Neighborhoods [17.852415436033436]
We propose a novel framework named textitTF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification.
arXiv Detail & Related papers (2022-08-19T12:59:46Z) - Provably Sample-Efficient RL with Side Information about Latent Dynamics [12.461789905893026]
We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space.
We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
arXiv Detail & Related papers (2022-05-27T21:07:03Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Domain-invariant NBV Planner for Active Cross-domain Self-localization [0.0]
We develop a system for active self-localization using sparse invariant landmarks and dense discriminative landmarks.
In experiments, we demonstrate that the proposed method is effective both in efficient landmark detection and in discriminative self-localization.
arXiv Detail & Related papers (2021-02-23T07:36:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.