Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
- URL: http://arxiv.org/abs/2101.06894v2
- Date: Sun, 24 Jan 2021 18:00:50 GMT
- Title: Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
- Authors: Antoni Rosinol, Andrew Violette, Marcus Abate, Nathan Hughes, Yun
Chang, Jingnan Shi, Arjun Gupta, Luca Carlone
- Abstract summary: Humans are able to form a complex mental model of the environment they move in.
Current robots' internal representations still provide a partial and fragmented understanding of the environment.
This paper introduces a novel representation, a 3D Dynamic Scene Graph.
- Score: 20.960087818959206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are able to form a complex mental model of the environment they move
in. This mental model captures geometric and semantic aspects of the scene,
describes the environment at multiple levels of abstractions (e.g., objects,
rooms, buildings), includes static and dynamic entities and their relations
(e.g., a person is in a room at a given time). In contrast, current robots'
internal representations still provide a partial and fragmented understanding
of the environment, either in the form of a sparse or dense set of geometric
primitives (e.g., points, lines, planes, voxels) or as a collection of objects.
This paper attempts to reduce the gap between robot and human perception by
introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that
seamlessly captures metric and semantic aspects of a dynamic environment. A DSG
is a layered graph where nodes represent spatial concepts at different levels
of abstraction, and edges represent spatio-temporal relations among nodes. Our
second contribution is Kimera, the first fully automatic method to build a DSG
from visual-inertial data. Kimera includes state-of-the-art techniques for
visual-inertial SLAM, metric-semantic 3D reconstruction, object localization,
human pose and shape estimation, and scene parsing. Our third contribution is a
comprehensive evaluation of Kimera in real-life datasets and photo-realistic
simulations, including a newly released dataset, uHumans2, which simulates a
collection of crowded indoor and outdoor scenes. Our evaluation shows that
Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates
an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a
complex indoor environment with tens of objects and humans in minutes. Our
final contribution shows how to use a DSG for real-time hierarchical semantic
path-planning. The core modules in Kimera are open-source.
Related papers
- PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments [49.38692556283867]
We propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting.
Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera.
Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation.
arXiv Detail & Related papers (2024-11-24T12:00:55Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose [20.099670445427964]
Reconstructing multi-human body mesh from a single monocular image is an important but challenging computer vision problem.
In this work, through a single graph neural network, we construct coherent multi-human meshes using only multi-human 2D pose as input.
arXiv Detail & Related papers (2022-05-25T08:54:52Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z) - Shallow2Deep: Indoor Scene Modeling by Single Image Understanding [42.87957414916607]
We present an automatic indoor scene modeling approach using deep features from neural networks.
Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship.
arXiv Detail & Related papers (2020-02-22T23:27:22Z) - 3D Dynamic Scene Graphs: Actionable Spatial Perception with Places,
Objects, and Humans [27.747241700017728]
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs.
3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction.
arXiv Detail & Related papers (2020-02-15T00:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.