Related papers: Semantic and Geometric Modeling with Neural Message Passing in 3D Scene Graphs for Hierarchical Mechanical Search

Semantic and Geometric Modeling with Neural Message Passing in 3D Scene Graphs for Hierarchical Mechanical Search

URL: http://arxiv.org/abs/2012.04060v1
Date: Mon, 7 Dec 2020 21:04:34 GMT
Title: Semantic and Geometric Modeling with Neural Message Passing in 3D Scene Graphs for Hierarchical Mechanical Search
Authors: Andrey Kurenkov, Roberto Mart\'in-Mart\'in, Jeff Ichnowski, Ken Goldberg, Silvio Savarese
Abstract summary: We use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem. We introduce Hierarchical Mechanical Search (HMS), a method that guides an agent's actions towards finding a target object specified with a natural language description. HMS is evaluated on a novel dataset of 500 3D scene graphs with dense placements of semantically related objects in storage locations.
Score: 48.655167907740136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Searching for objects in indoor organized environments such as homes or offices is part of our everyday activities. When looking for a target object, we jointly reason about the rooms and containers the object is likely to be in; the same type of container will have a different probability of having the target depending on the room it is in. We also combine geometric and semantic information to infer what container is best to search, or what other objects are best to move, if the target object is hidden from view. We propose to use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem. To exploit this representation in a search process, we introduce Hierarchical Mechanical Search (HMS), a method that guides an agent's actions towards finding a target object specified with a natural language description. HMS is based on a novel neural network architecture that uses neural message passing of vectors with visual, geometric, and linguistic information to allow HMS to reason across layers of the graph while combining semantic and geometric cues. HMS is evaluated on a novel dataset of 500 3D scene graphs with dense placements of semantically related objects in storage locations, and is shown to be significantly better than several baselines at finding objects and close to the oracle policy in terms of the median number of actions required. Additional qualitative results can be found at https://ai.stanford.edu/mech-search/hms.

Related papers

Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding. An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z)
Multiview Scene Graph [7.460438046915524]
A proper scene representation is central to the pursuit of spatial intelligence. We propose to build Multiview Scene Graphs (MSG) from unposed images. MSG represents a scene topologically with interconnected place and object nodes.
arXiv Detail & Related papers (2024-10-15T02:04:05Z)
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level. Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z)
Task-Driven Graph Attention for Hierarchical Relational Object Navigation [25.571175038938527]
Embodied AI agents in large scenes often need to navigate to find objects. We study a naturally emerging variant of the object navigation task, hierarchical object navigation (HRON) We propose a solution that uses scene graphs as part of its input and integrates graph neural networks as its backbone.
arXiv Detail & Related papers (2023-06-23T19:50:48Z)
Generating Visual Spatial Description via Holistic 3D Scene Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images. We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z)
SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph. We propose a learned method that regresses a scene graph from the point cloud of a scene. We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z)
Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.