Semantic and Geometric Modeling with Neural Message Passing in 3D Scene
Graphs for Hierarchical Mechanical Search
- URL: http://arxiv.org/abs/2012.04060v1
- Date: Mon, 7 Dec 2020 21:04:34 GMT
- Title: Semantic and Geometric Modeling with Neural Message Passing in 3D Scene
Graphs for Hierarchical Mechanical Search
- Authors: Andrey Kurenkov, Roberto Mart\'in-Mart\'in, Jeff Ichnowski, Ken
Goldberg, Silvio Savarese
- Abstract summary: We use a 3D scene graph representation to capture the hierarchical, semantic, and geometric aspects of this problem.
We introduce Hierarchical Mechanical Search (HMS), a method that guides an agent's actions towards finding a target object specified with a natural language description.
HMS is evaluated on a novel dataset of 500 3D scene graphs with dense placements of semantically related objects in storage locations.
- Score: 48.655167907740136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Searching for objects in indoor organized environments such as homes or
offices is part of our everyday activities. When looking for a target object,
we jointly reason about the rooms and containers the object is likely to be in;
the same type of container will have a different probability of having the
target depending on the room it is in. We also combine geometric and semantic
information to infer what container is best to search, or what other objects
are best to move, if the target object is hidden from view. We propose to use a
3D scene graph representation to capture the hierarchical, semantic, and
geometric aspects of this problem. To exploit this representation in a search
process, we introduce Hierarchical Mechanical Search (HMS), a method that
guides an agent's actions towards finding a target object specified with a
natural language description. HMS is based on a novel neural network
architecture that uses neural message passing of vectors with visual,
geometric, and linguistic information to allow HMS to reason across layers of
the graph while combining semantic and geometric cues. HMS is evaluated on a
novel dataset of 500 3D scene graphs with dense placements of semantically
related objects in storage locations, and is shown to be significantly better
than several baselines at finding objects and close to the oracle policy in
terms of the median number of actions required. Additional qualitative results
can be found at https://ai.stanford.edu/mech-search/hms.
Related papers
- Beyond Bare Queries: Open-Vocabulary Object Retrieval with 3D Scene Graph [0.4369058206183195]
Existing CLIP-based open-vocabulary methods successfully perform 3D object retrieval with simple (bare) queries.
We propose a modular approach called BBQ which constructs 3D scene spatial graph representation with metric edges.
BBQ employs robust DINO-powered associations to form 3D objects, an advanced raycasting algorithm to project them to 2D, and a vision-language model to describe them as graph nodes.
arXiv Detail & Related papers (2024-06-11T09:57:04Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - Task-Driven Graph Attention for Hierarchical Relational Object
Navigation [25.571175038938527]
Embodied AI agents in large scenes often need to navigate to find objects.
We study a naturally emerging variant of the object navigation task, hierarchical object navigation (HRON)
We propose a solution that uses scene graphs as part of its input and integrates graph neural networks as its backbone.
arXiv Detail & Related papers (2023-06-23T19:50:48Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Zero-shot point cloud segmentation by transferring geometric primitives [68.18710039217336]
We investigate zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects.
We propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives.
arXiv Detail & Related papers (2022-10-18T15:06:54Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z) - Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images.
We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.