Primitive-based 3D Human-Object Interaction Modelling and Programming
- URL: http://arxiv.org/abs/2312.10714v1
- Date: Sun, 17 Dec 2023 13:16:49 GMT
- Title: Primitive-based 3D Human-Object Interaction Modelling and Programming
- Authors: Siqi Liu, Yong-Lu Li, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu
- Abstract summary: We propose a novel 3D geometric primitive-based language to encode both humans and objects.
We build a new benchmark on 3D HAOI consisting of primitives together with their images.
We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies.
- Score: 59.47308081630886
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Embedding Human and Articulated Object Interaction (HAOI) in 3D is an
important direction for a deeper human activity understanding. Different from
previous works that use parametric and CAD models to represent humans and
objects, in this work, we propose a novel 3D geometric primitive-based language
to encode both humans and objects. Given our new paradigm, humans and objects
are all compositions of primitives instead of heterogeneous entities. Thus,
mutual information learning may be achieved between the limited 3D data of
humans and different object categories. Moreover, considering the simplicity of
the expression and the richness of the information it contains, we choose the
superquadric as the primitive representation. To explore an effective embedding
of HAOI for the machine, we build a new benchmark on 3D HAOI consisting of
primitives together with their images and propose a task requiring machines to
recover 3D HAOI using primitives from images. Moreover, we propose a baseline
of single-view 3D reconstruction on HAOI. We believe this primitive-based 3D
HAOI representation would pave the way for 3D HAOI studies. Our code and data
are available at https://mvig-rhos.com/p3haoi.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - Perceiving Unseen 3D Objects by Poking the Objects [45.70559270947074]
We propose a poking-based approach that automatically discovers and reconstructs 3D objects.
The poking process not only enables the robot to discover unseen 3D objects but also produces multi-view observations.
The experiments on real-world data show that our approach could unsupervisedly discover and reconstruct unseen 3D objects with high quality.
arXiv Detail & Related papers (2023-02-26T18:22:13Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z) - Reconstructing Action-Conditioned Human-Object Interactions Using
Commonsense Knowledge Priors [42.17542596399014]
We present a method for inferring diverse 3D models of human-object interactions from images.
Our method extracts high-level commonsense knowledge from large language models.
We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset.
arXiv Detail & Related papers (2022-09-06T13:32:55Z) - CHORE: Contact, Human and Object REconstruction from a single RGB image [40.817960406002506]
CHORE is a novel method that learns to jointly reconstruct the human and the object from a single RGB image.
We compute a neural reconstruction of human and object represented implicitly with two unsigned distance fields.
Experiments show that our joint reconstruction learned with the proposed strategy significantly outperforms the SOTA.
arXiv Detail & Related papers (2022-04-05T18:38:06Z) - 3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis.
Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z) - Parameter-Efficient Person Re-identification in the 3D Space [51.092669618679615]
We project 2D images to a 3D space and introduce a novel parameter-efficient Omni-scale Graph Network (OG-Net) to learn the pedestrian representation directly from 3D point clouds.
OG-Net effectively exploits the local information provided by sparse 3D points and takes advantage of the structure and appearance information in a coherent manner.
We are among the first attempts to conduct person re-identification in the 3D space.
arXiv Detail & Related papers (2020-06-08T13:20:33Z) - Detailed 2D-3D Joint Representation for Human-Object Interaction [45.71407935014447]
We propose a detailed 2D-3D joint representation learning method for HOI learning.
First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes.
Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors.
arXiv Detail & Related papers (2020-04-17T10:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.