Related papers: Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates

Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates

URL: http://arxiv.org/abs/2407.08162v2
Date: Tue, 19 Nov 2024 03:30:11 GMT
Title: Improving Visual Place Recognition Based Robot Navigation By Verifying Localization Estimates
Authors: Owen Claxton, Connor Malone, Helen Carson, Jason Ford, Gabe Bolton, Iman Shames, Michael Milford,
Abstract summary: This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor. It demonstrates improved performance and generalizability, removing per-environment training and reducing manual tuning requirements. We test our proposed system in extensive real-world experiments.
Score: 14.354164363224529
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Place Recognition (VPR) systems often have imperfect performance, affecting the `integrity' of position estimates and subsequent robot navigation decisions. Previously, SVM classifiers have been used to monitor VPR integrity. This research introduces a novel Multi-Layer Perceptron (MLP) integrity monitor which demonstrates improved performance and generalizability, removing per-environment training and reducing manual tuning requirements. We test our proposed system in extensive real-world experiments, presenting two real-time integrity-based VPR verification methods: a single-query rejection method for robot navigation to a goal zone (Experiment 1); and a history-of-queries method that takes a best, verified, match from its recent trajectory and uses an odometer to extrapolate a current position estimate (Experiment 2). Noteworthy results for Experiment 1 include a decrease in aggregate mean along-track goal error from ~9.8m to ~3.1m, and an increase in the aggregate rate of successful mission completion from ~41% to ~55%. Experiment 2 showed a decrease in aggregate mean along-track localization error from ~2.0m to ~0.5m, and an increase in the aggregate localization precision from ~97% to ~99%. Overall, our results demonstrate the practical usefulness of a VPR integrity monitor in real-world robotics to improve VPR localization and consequent navigation performance.

Related papers

Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation [16.01119279073898]
Stand-alone Visual Place Recognition (VPR) systems have little defence against well-designed adversarial attacks.<n>This paper extensively analyzes the effect of four adversarial attacks common in other perception tasks and four novel VPR-specific attacks on VPR localization performance.
arXiv Detail & Related papers (2025-06-19T03:19:21Z)
EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition [9.75969669445091]
Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision.<n>We propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints.
arXiv Detail & Related papers (2025-06-16T06:40:12Z)
Active Test-time Vision-Language Navigation [60.69722522420299]
ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
arXiv Detail & Related papers (2025-06-07T02:24:44Z)
STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision [3.671692919685993]
We introduce two sequential generative models, VAE-RNN and VAE-Transformer, which transform first-person perspective observations into global map perspective representations. We evaluate these models across two real-world environments: a university campus navigated by a Jackal robot and an urban downtown area navigated by a Tesla sedan.
arXiv Detail & Related papers (2025-03-11T00:38:54Z)
DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition [4.678150356894011]
integration of Unmanned Aerial Vehicles with artificial intelligence (AI) models for aerial imagery processing in disaster assessment requires exceptional accuracy, computational efficiency, and real-time processing capabilities. Traditionally Convolutional Neural Networks (CNNs) demonstrate efficiency in local feature extraction but are limited by their potential for global context interpretation. Vision Transformers (ViTs) show promise for improved global context interpretation through the use of attention mechanisms, although they still remain underinvestigated in UAV-based disaster response applications.
arXiv Detail & Related papers (2024-10-17T15:25:13Z)
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR) Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z)
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection [3.849401956130233]
We explore the effectiveness of pre-trained vision-language models (VLMs) when paired with recent adaptation methods for universal deepfake detection. We employ only a single dataset (ProGAN) in order to adapt CLIP for deepfake detection. The simple and lightweight Prompt Tuning based adaptation strategy outperforms the previous SOTA approach by 5.01% mAP and 6.61% accuracy.
arXiv Detail & Related papers (2024-02-20T11:26:42Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
Tightly-coupled Visual-DVL-Inertial Odometry for Robot-based Ice-water Boundary Exploration [8.555466536537292]
We present a multi-sensors fusion framework to increase localization accuracy. Visual images, Doppler Velocity Log (DVL), Inertial Measurement Unit (IMU) and Pressure sensor are integrated. The proposed method is validated with a data set collected in the field under frozen ice.
arXiv Detail & Related papers (2023-03-29T20:16:39Z)
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition. Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Visual-tactile sensing for Real-time liquid Volume Estimation in Grasping [58.50342759993186]
We propose a visuo-tactile model for realtime estimation of the liquid inside a deformable container. We fuse two sensory modalities, i.e., the raw visual inputs from the RGB camera and the tactile cues from our specific tactile sensor. The robotic system is well controlled and adjusted based on the estimation model in real time.
arXiv Detail & Related papers (2022-02-23T13:38:31Z)
Improved YOLOv5 network for real-time multi-scale traffic sign detection [4.5598087061051755]
We propose an improved feature pyramid model, named AF-FPN, which utilize the adaptive attention module (AAM) and feature enhancement module (FEM) to reduce the information loss in the process of feature map generation. We replace the original feature pyramid network in YOLOv5 with AF-FPN, which improves the detection performance for multi-scale targets of the YOLOv5 network.
arXiv Detail & Related papers (2021-12-16T11:02:12Z)
FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.