Related papers: Understanding while Exploring: Semantics-driven Active Mapping

Understanding while Exploring: Semantics-driven Active Mapping

URL: http://arxiv.org/abs/2506.00225v1
Date: Fri, 30 May 2025 21:03:17 GMT
Title: Understanding while Exploring: Semantics-driven Active Mapping
Authors: Liyan Chen, Huangying Zhan, Hairong Yin, Yi Xu, Philippos Mordohai,
Abstract summary: ActiveSGM is an active semantic mapping framework designed to predict the informativeness of potential observations before execution.<n>By enabling robots to strategically select the most beneficial viewpoints, ActiveSGM efficiently enhances mapping completeness, accuracy, and robustness to noisy semantic data.<n>Our experiments on the Replica and Matterport3D datasets highlight the effectiveness of ActiveSGM in active semantic mapping tasks.
Score: 15.159760685637366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective robotic autonomy in unknown environments demands proactive exploration and precise understanding of both geometry and semantics. In this paper, we propose ActiveSGM, an active semantic mapping framework designed to predict the informativeness of potential observations before execution. Built upon a 3D Gaussian Splatting (3DGS) mapping backbone, our approach employs semantic and geometric uncertainty quantification, coupled with a sparse semantic representation, to guide exploration. By enabling robots to strategically select the most beneficial viewpoints, ActiveSGM efficiently enhances mapping completeness, accuracy, and robustness to noisy semantic data, ultimately supporting more adaptive scene exploration. Our experiments on the Replica and Matterport3D datasets highlight the effectiveness of ActiveSGM in active semantic mapping tasks.

Related papers

TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis [0.0]
We propose TrajSceneLLM, a multimodal perspective for enhancing semantic understanding of GPS trajectories.<n>We validate the proposed framework on Travel Mode Identification (TMI), a critical task for analyzing travel choices and understanding mobility behavior.<n>This semantic enhancement promises significant potential for diverse downstream applications and future research in artificial intelligence.
arXiv Detail & Related papers (2025-06-19T15:31:40Z)
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robots Equipped with LiDAR and Panoramic Camera [7.330549613211134]
This paper presents a system for autonomous semantic exploration and dense semantic target mapping of a complex unknown environment using a ground robot equipped with a LiDAR-panoramic camera suite.<n>We first redefine the task as completing both geometric coverage and semantic viewpoint observation. We then manage semantic and geometric viewpoints separately and propose a novel Priority-driven Decoupled Local Sampler to generate local viewpoint sets.<n>In addition, we propose a Safe Aggressive Exploration State Machine, which allows aggressive exploration behavior while ensuring the robot's safety.
arXiv Detail & Related papers (2025-05-28T21:27:32Z)
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO [63.140883026848286]
Active vision refers to the process of actively selecting where and how to look in order to gather task-relevant information.<n>Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention.
arXiv Detail & Related papers (2025-05-27T17:29:31Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ActiveGAMER: Active GAussian Mapping through Efficient Rendering [27.914247021088237]
ActiveGAMER is an active mapping system that utilizes 3D Gaussian Splatting (3DGS) to achieve high-quality, real-time scene mapping and exploration.<n>Our system autonomously explores and reconstructs environments with state-of-the-art rendering and photometric accuracy and completeness.
arXiv Detail & Related papers (2025-01-12T18:38:51Z)
SADG: Segment Any Dynamic Gaussian Without Object Trackers [39.77468734311312]
SADG, Segment Any Dynamic Gaussian Without Object Trackers, is a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.<n>We learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining.<n>We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes.
arXiv Detail & Related papers (2024-11-28T17:47:48Z)
Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots [6.395242048226456]
We propose a complement-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. We show a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.
arXiv Detail & Related papers (2024-07-08T16:25:01Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z)
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency [122.18108118190334]
We present a framework called Self- Embodied Embodied Active Learning (SEAL) It utilizes perception models trained on internet images to learn an active exploration policy. We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
arXiv Detail & Related papers (2021-12-02T06:26:38Z)
Improving Point Cloud Semantic Segmentation by Learning 3D Object Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving. Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes. We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.