3D Segmentation of Humans in Point Clouds with Synthetic Data
- URL: http://arxiv.org/abs/2212.00786v4
- Date: Fri, 18 Aug 2023 12:51:38 GMT
- Title: 3D Segmentation of Humans in Point Clouds with Synthetic Data
- Authors: Ay\c{c}a Takmaz, Jonas Schult, Irem Kaftan, Mertcan Ak\c{c}ay, Bastian
Leibe, Robert Sumner, Francis Engelmann, Siyu Tang
- Abstract summary: We propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation.
We propose a framework for generating training data of synthetic humans interacting with real 3D scenes.
We also propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts.
- Score: 21.518379214837278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting humans in 3D indoor scenes has become increasingly important with
the rise of human-centered robotics and AR/VR applications. To this end, we
propose the task of joint 3D human semantic segmentation, instance segmentation
and multi-human body-part segmentation. Few works have attempted to directly
segment humans in cluttered 3D scenes, which is largely due to the lack of
annotated training data of humans interacting with 3D scenes. We address this
challenge and propose a framework for generating training data of synthetic
humans interacting with real 3D scenes. Furthermore, we propose a novel
transformer-based model, Human3D, which is the first end-to-end model for
segmenting multiple human instances and their body-parts in a unified manner.
The key advantage of our synthetic data generation framework is its ability to
generate diverse and realistic human-scene interactions, with highly accurate
ground truth. Our experiments show that pre-training on synthetic data improves
performance on a wide variety of 3D human segmentation tasks. Finally, we
demonstrate that Human3D outperforms even task-specific state-of-the-art 3D
segmentation methods.
Related papers
- Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models [16.259040755335885]
Previous auto-regression-based 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans.
We introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints.
Our framework can generate more natural and plausible 3D scenes with precise human-scene interactions.
arXiv Detail & Related papers (2024-06-26T08:18:39Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Primitive-based 3D Human-Object Interaction Modelling and Programming [59.47308081630886]
We propose a novel 3D geometric primitive-based language to encode both humans and objects.
We build a new benchmark on 3D HAOI consisting of primitives together with their images.
We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies.
arXiv Detail & Related papers (2023-12-17T13:16:49Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - GenZI: Zero-Shot 3D Human-Scene Interaction Generation [39.9039943099911]
We propose GenZI, the first zero-shot approach to generating 3D human-scene interactions.
Key to GenZI is our distillation of interaction priors from large vision-language models (VLMs), which have learned a rich semantic space of 2D human-scene compositions.
In contrast to existing learning-based approaches, GenZI circumvents the conventional need for captured 3D interaction data.
arXiv Detail & Related papers (2023-11-29T15:40:11Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - Compositional Human-Scene Interaction Synthesis with Semantic Control [16.93177243590465]
We aim to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications.
We design a novel transformer-based generative model, in which the articulated 3D human body surface points and 3D objects are jointly encoded.
Inspired by the compositional nature of interactions that humans can simultaneously interact with multiple objects, we define interaction semantics as the composition of varying numbers of atomic action-object pairs.
arXiv Detail & Related papers (2022-07-26T11:37:44Z) - S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling [103.65625425020129]
We represent the pedestrian's shape, pose and skinning weights as neural implicit functions that are directly learned from data.
We demonstrate the effectiveness of our approach on various datasets and show that our reconstructions outperform existing state-of-the-art methods.
arXiv Detail & Related papers (2021-01-17T02:16:56Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.