A Modular Robotic System for Autonomous Exploration and Semantic Updating in Large-Scale Indoor Environments
- URL: http://arxiv.org/abs/2409.15493v3
- Date: Fri, 19 Sep 2025 21:22:29 GMT
- Title: A Modular Robotic System for Autonomous Exploration and Semantic Updating in Large-Scale Indoor Environments
- Authors: Sai Haneesh Allu, Itay Kadosh, Tyler Summers, Yu Xiang,
- Abstract summary: We present a modular robotic system for autonomous exploration and semantic updating of large-scale unknown environments.<n>Our approach enables a mobile robot to build, revisit, and update a hybrid semantic map that integrates a 2D occupancy grid for geometry with a topological graph for object semantics.
- Score: 2.8252013503798907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a modular robotic system for autonomous exploration and semantic updating of large-scale unknown environments. Our approach enables a mobile robot to build, revisit, and update a hybrid semantic map that integrates a 2D occupancy grid for geometry with a topological graph for object semantics. Unlike prior methods that rely on manual teleoperation or precollected datasets, our two-phase approach achieves end-to-end autonomy: first, a modified frontier-based exploration algorithm with dynamic search windows constructs a geometric map; second, using a greedy trajectory planner, environments are revisited, and object semantics are updated using open-vocabulary object detection and segmentation. This modular system, compatible with any metric SLAM framework, supports continuous operation by efficiently updating the semantic graph to reflect short-term and long-term changes such as object relocation, removal, or addition. We validate the approach on a Fetch robot in real-world indoor environments of approximately $8,500$m$^2$ and $117$m$^2$, demonstrating robust and scalable semantic mapping and continuous adaptation, marking a fully autonomous integration of exploration, mapping, and semantic updating on a physical robot.
Related papers
- Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments [41.80879866951797]
Hi-Dyna Graph is a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy.<n>An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances.
arXiv Detail & Related papers (2025-05-30T03:35:29Z) - Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion [10.438142938687326]
This paper presents a system for autonomous semantic exploration and dense semantic target mapping of a complex unknown environment using a ground robot equipped with a LiDAR-panoramic camera suite.<n>We first redefine the task as completing both geometric coverage and semantic viewpoint observation. We then manage semantic and geometric viewpoints separately and propose a novel Priority-driven Decoupled Local Sampler to generate local viewpoint sets.<n>In addition, we propose a Safe Aggressive Exploration State Machine, which allows aggressive exploration behavior while ensuring the robot's safety.
arXiv Detail & Related papers (2025-05-28T21:27:32Z) - FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment [16.987872206495897]
FindAnything is an open-world mapping framework that incorporates vision-language information into dense volumetric submaps.
Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs.
arXiv Detail & Related papers (2025-04-11T15:12:05Z) - Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications [0.7951977175758216]
Traditional RGB-D processing pipelines focus primarily on geometric reconstruction.<n>We introduce a novel end-to-end modular pipeline that integrates semantic segmentation, human tracking, point-cloud fusion, and scene reconstruction.<n>We validate our approach on benchmark datasets and real-world Kinect RGB-D data, demonstrating improved efficiency, accuracy, and usability.
arXiv Detail & Related papers (2024-10-23T16:01:31Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - Memorize What Matters: Emergent Scene Decomposition from Multitraverse [54.487589469432706]
We introduce 3D Gaussian Mapping, a camera-only offline mapping framework grounded in 3D Gaussian Splatting.
3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation.
We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering.
arXiv Detail & Related papers (2024-05-27T14:11:17Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Constructing Metric-Semantic Maps using Floor Plan Priors for Long-Term
Indoor Localization [29.404446814219202]
In this paper, we address the task of constructing a metric-semantic map for the purpose of long-term object-based localization.
We exploit 3D object detections from monocular RGB frames for both, the object-based map construction, and for globally localizing in the constructed map.
We evaluate our map construction in an office building, and test our long-term localization approach on challenging sequences recorded in the same environment over nine months.
arXiv Detail & Related papers (2023-03-20T09:33:05Z) - Object Goal Navigation Based on Semantics and RGB Ego View [9.702784248870522]
This paper presents an architecture and methodology to empower a service robot to navigate an indoor environment with semantic decision making, given RGB ego view.
The robot navigates based on GeoSem map - a relational combination of geometric and semantic map.
The presented approach was found to outperform human users in gamified evaluations with respect to average completion time.
arXiv Detail & Related papers (2022-10-20T19:23:08Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World
Semantic Scene Understanding [34.19666841489646]
We show how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment.
We develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model.
In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work.
arXiv Detail & Related papers (2022-06-21T18:41:51Z) - Efficient Placard Discovery for Semantic Mapping During Frontier
Exploration [0.0]
This work introduces an Interruptable Frontier Exploration algorithm, enabling the robot to explore its environment to construct its SLAM map while pausing to inspect placards observed during this process.
This allows the robot to autonomously discover room placards without human intervention while speeding up significantly over previous autonomous exploration methods.
arXiv Detail & Related papers (2021-10-27T20:00:07Z) - Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense
Forest Canopy [48.51396198176273]
We propose an integrated system that can perform large-scale autonomous flights and real-time semantic mapping in challenging under-canopy environments.
We detect and model tree trunks and ground planes from LiDAR data, which are associated across scans and used to constrain robot poses as well as tree trunk models.
A drift-compensation mechanism is designed to minimize the odometry drift using semantic SLAM outputs in real time, while maintaining planner optimality and controller stability.
arXiv Detail & Related papers (2021-09-14T07:24:53Z) - Indoor Semantic Scene Understanding using Multi-modality Fusion [0.0]
We present a semantic scene understanding pipeline that fuses 2D and 3D detection branches to generate a semantic map of the environment.
Unlike previous works that were evaluated on collected datasets, we test our pipeline on an active photo-realistic robotic environment.
Our novelty includes rectification of 3D proposals using projected 2D detections and modality fusion based on object size.
arXiv Detail & Related papers (2021-08-17T13:30:02Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for
Multi-Robot Systems [92.26462290867963]
Kimera-Multi is the first multi-robot system that is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures.
We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots.
arXiv Detail & Related papers (2021-06-28T03:56:40Z) - Kimera-Multi: a System for Distributed Multi-Robot Metric-Semantic
Simultaneous Localization and Mapping [57.173793973480656]
We present the first fully distributed multi-robot system for dense metric-semantic SLAM.
Our system, dubbed Kimera-Multi, is implemented by a team of robots equipped with visual-inertial sensors.
Kimera-Multi builds a 3D mesh model of the environment in real-time, where each face of the mesh is annotated with a semantic label.
arXiv Detail & Related papers (2020-11-08T21:38:12Z) - Lifelong update of semantic maps in dynamic environments [2.343080600040765]
A robot understands its world through the raw information it senses from its surroundings.
A semantic map, containing high-level information that both the robot and user understand, is better suited to be a shared representation.
We use the semantic map as the user-facing interface on our fleet of floor-cleaning robots.
arXiv Detail & Related papers (2020-10-17T18:44:33Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images.
We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z) - Visual Semantic SLAM with Landmarks for Large-Scale Outdoor Environment [47.96314050446863]
We build a system to creat a semantic 3D map by combining 3D point cloud from ORB SLAM with semantic segmentation information from PSPNet-101 for large-scale environments.
We find a way to associate the real-world landmark with point cloud map and built a topological map based on semantic map.
arXiv Detail & Related papers (2020-01-04T03:34:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.