HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
- URL: http://arxiv.org/abs/2403.12011v1
- Date: Mon, 18 Mar 2024 17:48:31 GMT
- Title: HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
- Authors: Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang,
- Abstract summary: We propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis.
We adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems.
- Score: 42.49031063635004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more controllable and realistic synthesis as we can specify the structure and style inputs in a disentangled manner. HOIDiffusion is trained by leveraging a diffusion model pre-trained on large-scale natural images and a few 3D human demonstrations. Beyond controllable image synthesis, we adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems. Project page: https://mq-zhang1.github.io/HOIDiffusion
Related papers
- 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - VR-based generation of photorealistic synthetic data for training
hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software.
It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware.
We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z) - NAP: Neural 3D Articulation Prior [31.875925637190328]
We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models.
To generate articulated objects, we first design a novel articulation tree/graph parameterization and then apply a diffusion-denoising probabilistic model over this representation.
In order to capture both the geometry and the motion structure whose distribution will affect each other, we design a graph-attention denoising network for learning the reverse diffusion process.
arXiv Detail & Related papers (2023-05-25T17:59:35Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z) - Disentangled3D: Learning a 3D Generative Model with Disentangled
Geometry and Appearance from Monocular Images [94.49117671450531]
State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis.
In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations.
arXiv Detail & Related papers (2022-03-29T22:03:18Z) - Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning
of 3D Pose [10.028521796737314]
We study the problem of learning to estimate the 3D object pose from a few labelled examples and a collection of unlabelled data.
Our main contribution is a learning framework, neural view synthesis and matching, that can transfer the 3D pose annotation from the labelled to unlabelled images reliably.
arXiv Detail & Related papers (2021-10-27T06:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.