Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)
- URL: http://arxiv.org/abs/2509.04948v1
- Date: Fri, 05 Sep 2025 09:14:59 GMT
- Title: Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)
- Authors: Emanuela Boros,
- Abstract summary: This work addresses topological localization in an office environment using only images acquired with a perspective color camera mounted on a robot platform.<n>We evaluate state-of-the-art visual descriptors, including Color Histograms, SIFT, ASIFT, RGB-SIFT, and Bag-of-Visual-Words approaches inspired by text retrieval.
- Score: 0.43064121494080315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topological localization is a fundamental problem in mobile robotics, since robots must be able to determine their position in order to accomplish tasks. Visual localization and place recognition are challenging due to perceptual ambiguity, sensor noise, and illumination variations. This work addresses topological localization in an office environment using only images acquired with a perspective color camera mounted on a robot platform, without relying on temporal continuity of image sequences. We evaluate state-of-the-art visual descriptors, including Color Histograms, SIFT, ASIFT, RGB-SIFT, and Bag-of-Visual-Words approaches inspired by text retrieval. Our contributions include a systematic, quantitative comparison of these features, distance measures, and classifiers. Performance was analyzed using standard evaluation metrics and visualizations, extending previous experiments. Results demonstrate the advantages of proper configurations of appearance descriptors, similarity measures, and classifiers. The quality of these configurations was further validated in the Robot Vision task of the ImageCLEF evaluation campaign, where the system identified the most likely location of novel image sequences. Future work will explore hierarchical models, ranking methods, and feature combinations to build more robust localization systems, reducing training and runtime while avoiding the curse of dimensionality. Ultimately, this aims toward integrated, real-time localization across varied illumination and longer routes.
Related papers
- Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection [21.460727996614704]
MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera.<n>We propose a novel framework that unifies the detection of geometric primitives through a shared encoding.<n>This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling.
arXiv Detail & Related papers (2025-10-03T22:03:28Z) - Graph-based Robot Localization Using a Graph Neural Network with a Floor Camera and a Feature Rich Industrial Floor [0.0]
We propose an innovative framework that harnesses flooring characteris tics by employing graph-based representations and Graph Convolutional Networks (GCNs)<n>Our method uses graphs to represent floor features, which helps localize the robot more accurately (0.64cm error) and more efficiently than comparing individual image features.
arXiv Detail & Related papers (2025-08-08T09:46:28Z) - Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments [10.953629652228024]
Vision-and-Language Navigation (VLN) agents associate time-sequenced visual observations with corresponding instructions to make decisions.<n>In this paper, we address the mismatch between human-centric instructions and quadruped robots with a low-height field of view.<n>We propose a Ground-level Viewpoint Navigation (GVNav) approach to mitigate this issue.
arXiv Detail & Related papers (2025-02-26T10:30:40Z) - Exploring Emerging Trends and Research Opportunities in Visual Place Recognition [28.76562316749074]
Visual-based recognition is a long-standing challenge in computer vision and robotics communities.
Visual place recognition is vital for most localization implementations.
Researchers have recently turned their attention to vision-language models.
arXiv Detail & Related papers (2024-11-18T11:36:17Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach [47.373245682678515]
This work investigates how active visual localization can be used to overcome challenges of viewpoint changes.
Specifically, we focus on the problem of selecting the optimal viewpoint at a given location.
The result demonstrates the superior performance of the data-driven approach when compared to existing methods.
arXiv Detail & Related papers (2023-10-04T08:18:30Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Sparse Image based Navigation Architecture to Mitigate the need of
precise Localization in Mobile Robots [3.1556608426768324]
This paper focuses on mitigating the need for exact localization of a mobile robot to pursue autonomous navigation using a sparse set of images.
The proposed method consists of a model architecture - RoomNet, for unsupervised learning resulting in a coarse identification of the environment.
The latter uses sparse image matching to characterise the similarity of frames achieved vis-a-vis the frames viewed by the robot during the mapping and training stage.
arXiv Detail & Related papers (2022-03-29T06:38:18Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial
Observability in Visual Navigation [62.22058066456076]
Reinforcement Learning (RL) represents powerful tools to solve complex robotic tasks.
RL does not work directly in the real-world, which is known as the sim-to-real transfer problem.
We propose a method that learns on an observation space constructed by point clouds and environment randomization.
arXiv Detail & Related papers (2020-07-27T17:46:59Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - Real-Time Object Detection and Recognition on Low-Compute Humanoid
Robots using Deep Learning [0.12599533416395764]
We describe a novel architecture that enables multiple low-compute NAO robots to perform real-time detection, recognition and localization of objects in its camera view.
The proposed algorithm for object detection and localization is an empirical modification of YOLOv3, based on indoor experiments in multiple scenarios.
The architecture also comprises of an effective end-to-end pipeline to feed the real-time frames from the camera feed to the neural net and use its results for guiding the robot.
arXiv Detail & Related papers (2020-01-20T05:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.