Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection
- URL: http://arxiv.org/abs/2509.17877v1
- Date: Mon, 22 Sep 2025 15:14:02 GMT
- Title: Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection
- Authors: Richard Kuhlmann, Jakob Wolfram, Boyang Sun, Jiaxu Xing, Davide Scaramuzza, Marc Pollefeys, Cesar Cadena,
- Abstract summary: In this work, we revisit inspection from a perception-aware perspective.<n>We propose an end-to-end reinforcement learning framework that explicitly incorporates target visibility as the primary objective.<n>We show that our method outperforms existing classical and learning-based navigation approaches.
- Score: 57.37596278863949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous inspection is a central problem in robotics, with applications ranging from industrial monitoring to search-and-rescue. Traditionally, inspection has often been reduced to navigation tasks, where the objective is to reach a predefined location while avoiding obstacles. However, this formulation captures only part of the real inspection problem. In real-world environments, the inspection targets may become visible well before their exact coordinates are reached, making further movement both redundant and inefficient. What matters more for inspection is not simply arriving at the target's position, but positioning the robot at a viewpoint from which the target becomes observable. In this work, we revisit inspection from a perception-aware perspective. We propose an end-to-end reinforcement learning framework that explicitly incorporates target visibility as the primary objective, enabling the robot to find the shortest trajectory that guarantees visual contact with the target without relying on a map. The learned policy leverages both perceptual and proprioceptive sensing and is trained entirely in simulation, before being deployed to a real-world robot. We further develop an algorithm to compute ground-truth shortest inspection paths, which provides a reference for evaluation. Through extensive experiments, we show that our method outperforms existing classical and learning-based navigation approaches, yielding more efficient inspection trajectories in both simulated and real-world settings. The project is avialable at https://sight-over-site.github.io/
Related papers
- To Move or Not to Move: Constraint-based Planning Enables Zero-Shot Generalization for Interactive Navigation [14.745622942938532]
In real-world scenarios, such as home environments and warehouses, clutter can block all routes.<n>We introduce the Lifelong Interactive Navigation problem, where a mobile robot can move clutter to forge its own path.<n>We propose an LLM-driven, constraint-based planning framework with active perception.
arXiv Detail & Related papers (2026-02-23T17:10:00Z) - TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals [10.69725316052444]
We present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation.<n>Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles.<n>We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability.
arXiv Detail & Related papers (2025-09-10T15:43:32Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Self-Supervised Object Goal Navigation with In-Situ Finetuning [110.6053241629366]
This work builds an agent that builds self-supervised models of the world via exploration.
We identify a strong source of self-supervision that can train all components of an ObjectNav agent.
We show that our agent can perform competitively in the real world and simulation.
arXiv Detail & Related papers (2022-12-09T03:41:40Z) - Challenges in Visual Anomaly Detection for Mobile Robots [65.53820325712455]
We consider the task of detecting anomalies for autonomous mobile robots based on vision.
We categorize relevant types of visual anomalies and discuss how they can be detected by unsupervised deep learning methods.
arXiv Detail & Related papers (2022-09-22T13:26:46Z) - See What the Robot Can't See: Learning Cooperative Perception for Visual
Navigation [11.943412856714154]
We train the sensors to encode and communicate relevant viewpoint information to the mobile robot.
We overcome the challenge of enabling all the sensors to predict the direction along the shortest path to the target.
Our results show that by using communication between the sensors and the robot, we achieve up to 2.0x improvement in SPL.
arXiv Detail & Related papers (2022-08-01T11:37:01Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Robot Localization and Navigation through Predictive Processing using
LiDAR [0.0]
We show a proof-of-concept of the predictive processing-inspired approach to perception applied for localization and navigation using laser sensors.
We learn the generative model of the laser through self-supervised learning and perform both online state-estimation and navigation.
Results showed improved state-estimation performance when comparing to a state-of-the-art particle filter in the absence of odometry.
arXiv Detail & Related papers (2021-09-09T09:58:00Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target
Detection [36.79380276028116]
We study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment.
The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection.
arXiv Detail & Related papers (2020-05-05T20:39:18Z) - RetinaTrack: Online Single Stage Joint Detection and Tracking [22.351109024452462]
We focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical.
We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach.
arXiv Detail & Related papers (2020-03-30T23:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.