Related papers: Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor

Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor

URL: http://arxiv.org/abs/2109.02786v1
Date: Mon, 6 Sep 2021 23:51:27 GMT
Title: Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor
Authors: Tanaka Kanji
Abstract summary: We consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner. Experiments using the public NCLT dataset validated the effectiveness of the proposed approach.
Score: 5.482532589225552
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Landmark-based robot self-localization has recently garnered interest as a highly-compressive domain-invariant approach for performing visual place recognition (VPR) across domains (e.g., time of day, weather, and season). However, landmark-based self-localization can be an ill-posed problem for a passive observer (e.g., manual robot control), as many viewpoints may not provide an effective landmark view. In this study, we consider an active self-localization task by an active observer and present a novel reinforcement learning (RL)-based next-best-view (NBV) planner. Our contributions are as follows. (1) SIMBAD-based VPR: We formulate the problem of landmark-based compact scene description as SIMBAD (similarity-based pattern recognition) and further present its deep learning extension. (2) VPR-to-NBV knowledge transfer: We address the challenge of RL under uncertainty (i.e., active self-localization) by transferring the state recognition ability of VPR to the NBV. (3) NNQL-based NBV: We regard the available VPR as the experience database by adapting nearest-neighbor approximation of Q-learning (NNQL). The result shows an extremely compact data structure that compresses both the VPR and NBV into a single incremental inverted index. Experiments using the public NCLT dataset validated the effectiveness of the proposed approach.

Related papers

UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation [11.77871782073211]
Visual place recognition (VPR) is crucial for robots to identify previously visited locations. Most existing VPR datasets are limited to single-viewpoint scenarios. This paper introduces a novel training paradigm to improve the performance of existing VPR networks.
arXiv Detail & Related papers (2025-03-27T10:14:46Z)
BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model [8.720833232645155]
Bird's-Eye-View (BEV) image is one of the most widely adopted data representations in autonomous driving. We propose BEVDiffLoc, a novel framework that formulates LiDAR localization as a conditional generation of poses.
arXiv Detail & Related papers (2025-03-14T13:17:43Z)
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. We propose a novel method to realize seamless adaptation of foundation models to VPR. In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms. We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels. Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context. Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets. Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z)
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition. We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR. We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z)
OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition [10.39935021754015]
We develop OverlapMamba, a novel network for place recognition as sequences. Our method effectively detects loop closures showing even when traversing previously visited locations from different directions. Relying on raw range view inputs, it outperforms typical LiDAR and multi-view combination methods in time complexity and speed.
arXiv Detail & Related papers (2024-05-13T17:46:35Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
CPR++: Object Localization via Single Coarse Point Supervision [55.8671776333499]
coarse point refinement (CPR) is first attempt to alleviate semantic variance from an algorithmic perspective. CPR reduces semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point. CPR++ can obtain scale information and further reduce the semantic variance in a global region.
arXiv Detail & Related papers (2024-01-30T17:38:48Z)
Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization. This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z)
Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data [16.540900776820084]
We propose a novel framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods. Our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity.
arXiv Detail & Related papers (2022-08-19T12:59:46Z)
Provably Sample-Efficient RL with Side Information about Latent Dynamics [12.461789905893026]
We study reinforcement learning in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space. We present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is in the horizon.
arXiv Detail & Related papers (2022-05-27T21:07:03Z)
On Exploring Pose Estimation as an Auxiliary Learning Task for Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework. By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features. Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z)
Domain-invariant NBV Planner for Active Cross-domain Self-localization [0.0]
We develop a system for active self-localization using sparse invariant landmarks and dense discriminative landmarks. In experiments, we demonstrate that the proposed method is effective both in efficient landmark detection and in discriminative self-localization.
arXiv Detail & Related papers (2021-02-23T07:36:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.