Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues
- URL: http://arxiv.org/abs/2003.06336v1
- Date: Fri, 13 Mar 2020 15:05:23 GMT
- Title: Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues
- Authors: Renato Martins, Dhiego Bersan, Mario F. M. Campos and Erickson R.
Nascimento
- Abstract summary: This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images.
We propose a complete framework to create an enhanced map representation of the environment with object-level information.
- Score: 12.984393386954219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of building augmented metric representations
of scenes with semantic information from RGB-D images. We propose a complete
framework to create an enhanced map representation of the environment with
object-level information to be used in several applications such as human-robot
interaction, assistive robotics, visual navigation, or in manipulation tasks.
Our formulation leverages a CNN-based object detector (Yolo) with a 3D
model-based segmentation technique to perform instance semantic segmentation,
and to localize, identify, and track different classes of objects in the scene.
The tracking and positioning of semantic classes is done with a dictionary of
Kalman filters in order to combine sensor measurements over time and then
providing more accurate maps. The formulation is designed to identify and to
disregard dynamic objects in order to obtain a medium-term invariant map
representation. The proposed method was evaluated with collected and publicly
available RGB-D data sequences acquired in different indoor scenes.
Experimental results show the potential of the technique to produce augmented
semantic maps containing several objects (notably doors). We also provide to
the community a dataset composed of annotated object classes (doors, fire
extinguishers, benches, water fountains) and their positioning, as well as the
source code as ROS packages.
Related papers
- Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data [0.0]
This paper presents a real-time pipeline for localizing building components, including wall and ground surfaces, by integrating geometric calculations for pure 3D plane detection.
It has a parallel multi-thread architecture to precisely estimate poses and equations of all the planes detected in the environment, filters the ones forming the map structure using a panoptic segmentation validation, and keeps only the validated building components.
It can also ensure (re-)association of these detected components into a unified 3D scene graph, bridging the gap between geometric accuracy and semantic understanding.
arXiv Detail & Related papers (2024-09-10T16:28:09Z) - Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images [15.921719523588996]
Existing monocular and RGB-D methods suffer from scale ambiguity due to missing or depth measurements.
We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images.
Our dataset, code, and demos will be available on our project page.
arXiv Detail & Related papers (2024-07-09T15:59:03Z) - Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots [6.395242048226456]
We propose a complement-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline.
We show a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.
arXiv Detail & Related papers (2024-07-08T16:25:01Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor
Environments [67.34330257205525]
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner.
We present a method that uses annotated objects to learn the objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
arXiv Detail & Related papers (2022-12-22T17:59:48Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Semantic and Geometric Modeling with Neural Message Passing in 3D Scene
Graphs for Hierarchical Mechanical Search [48.655167907740136]
We use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem.
We introduce Hierarchical Mechanical Search (HMS), a method that guides an agent's actions towards finding a target object specified with a natural language description.
HMS is evaluated on a novel dataset of 500 3D scene graphs with dense placements of semantically related objects in storage locations.
arXiv Detail & Related papers (2020-12-07T21:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.