Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition
- URL: http://arxiv.org/abs/2403.07076v1
- Date: Mon, 11 Mar 2024 18:09:50 GMT
- Title: Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition
- Authors: Roberto Bigazzi, Lorenzo Baraldi, Shreyas Kousik, Rita Cucchiara,
Marco Pavone
- Abstract summary: The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
- Score: 50.624970503498226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots require a semantic understanding of their surroundings to operate in
an efficient and explainable way in human environments. In the literature,
there has been an extensive focus on object labeling and exhaustive scene graph
generation; less effort has been focused on the task of purely identifying and
mapping large semantic regions. The present work proposes a method for semantic
region mapping via embodied navigation in indoor environments, generating a
high-level representation of the knowledge of the agent. To enable region
identification, the method uses a vision-to-language model to provide scene
information for mapping. By projecting egocentric scene understanding into the
global frame, the proposed method generates a semantic map as a distribution
over possible region labels at each location. This mapping procedure is paired
with a trained navigation policy to enable autonomous map generation. The
proposed method significantly outperforms a variety of baselines, including an
object-based system and a pretrained scene classifier, in experiments in a
photorealistic simulator.
Related papers
- Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - SymboSLAM: Semantic Map Generation in a Multi-Agent System [0.0]
Sub-symbolic artificial intelligence methods dominate the fields of environment-type classification and Simultaneous Localisation and Mapping.
This paper proposes a novel approach to environment-type classification through Symbolic Simultaneous Localisation and Mapping, SymboSLAM, to bridge the explainability gap.
arXiv Detail & Related papers (2024-03-22T00:48:52Z) - Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - Learning Semantics for Visual Place Recognition through Multi-Scale
Attention [14.738954189759156]
We present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data.
Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods.
arXiv Detail & Related papers (2022-01-24T14:13:12Z) - Lightweight Object-level Topological Semantic Mapping and Long-term
Global Localization based on Graph Matching [19.706907816202946]
We present a novel lightweight object-level mapping and localization method with high accuracy and robustness.
We use object-level features with both semantic and geometric information to model landmarks in the environment.
Based on the proposed map, the robust localization is achieved by constructing a novel local semantic scene graph descriptor.
arXiv Detail & Related papers (2022-01-16T05:47:07Z) - Semantic Image Alignment for Vehicle Localization [111.59616433224662]
We present a novel approach to vehicle localization in dense semantic maps using semantic segmentation from a monocular camera.
In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors.
arXiv Detail & Related papers (2021-10-08T14:40:15Z) - SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A
Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information.
The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene.
We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.