Deep SIMBAD: Active Landmark-based Self-localization Using Ranking
-based Scene Descriptor
- URL: http://arxiv.org/abs/2109.02786v1
- Date: Mon, 6 Sep 2021 23:51:27 GMT
- Title: Deep SIMBAD: Active Landmark-based Self-localization Using Ranking
-based Scene Descriptor
- Authors: Tanaka Kanji
- Abstract summary: We consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner.
Experiments using the public NCLT dataset validated the effectiveness of the proposed approach.
- Score: 5.482532589225552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Landmark-based robot self-localization has recently garnered interest as a
highly-compressive domain-invariant approach for performing visual place
recognition (VPR) across domains (e.g., time of day, weather, and season).
However, landmark-based self-localization can be an ill-posed problem for a
passive observer (e.g., manual robot control), as many viewpoints may not
provide an effective landmark view. In this study, we consider an active
self-localization task by an active observer and present a novel reinforcement
learning (RL)-based next-best-view (NBV) planner. Our contributions are as
follows. (1) SIMBAD-based VPR: We formulate the problem of landmark-based
compact scene description as SIMBAD (similarity-based pattern recognition) and
further present its deep learning extension. (2) VPR-to-NBV knowledge transfer:
We address the challenge of RL under uncertainty (i.e., active
self-localization) by transferring the state recognition ability of VPR to the
NBV. (3) NNQL-based NBV: We regard the available VPR as the experience database
by adapting nearest-neighbor approximation of Q-learning (NNQL). The result
shows an extremely compact data structure that compresses both the VPR and NBV
into a single incremental inverted index. Experiments using the public NCLT
dataset validated the effectiveness of the proposed approach.
Related papers
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.
We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.
Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition.
We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR.
We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition [10.39935021754015]
We develop OverlapMamba, a novel network for place recognition as sequences.
Our method effectively detects loop closures showing even when traversing previously visited locations from different directions.
Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed.
arXiv Detail & Related papers (2024-05-13T17:46:35Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - CPR++: Object Localization via Single Coarse Point Supervision [55.8671776333499]
coarse point refinement (CPR) is first attempt to alleviate semantic variance from an algorithmic perspective.
CPR reduces semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
CPR++ can obtain scale information and further reduce the semantic variance in a global region.
arXiv Detail & Related papers (2024-01-30T17:38:48Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data [16.540900776820084]
We propose a novel framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
Our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity.
arXiv Detail & Related papers (2022-08-19T12:59:46Z) - Provably Sample-Efficient RL with Side Information about Latent Dynamics [12.461789905893026]
We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space.
We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
arXiv Detail & Related papers (2022-05-27T21:07:03Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Domain-invariant NBV Planner for Active Cross-domain Self-localization [0.0]
We develop a system for active self-localization using sparse invariant landmarks and dense discriminative landmarks.
In experiments, we demonstrate that the proposed method is effective both in efficient landmark detection and in discriminative self-localization.
arXiv Detail & Related papers (2021-02-23T07:36:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.