HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
- URL: http://arxiv.org/abs/2403.01693v2
- Date: Mon, 22 Apr 2024 03:53:51 GMT
- Title: HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
- Authors: Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai,
- Abstract summary: Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands.
Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations.
We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
- Score: 34.50137847908887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. To generate images with realistic hands, we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process. HanDiffuser consists of two components: a Text-to-Hand-Params diffusion model to generate SMPL-Body and MANO-Hand parameters from input text prompts, and a Text-Guided Hand-Params-to-Image diffusion model to synthesize images by conditioning on the prompts and hand parameters generated by the previous component. We incorporate multiple aspects of hand representation, including 3D shapes and joint-level finger positions, orientations and articulations, for robust learning and reliable performance during inference. We conduct extensive quantitative and qualitative experiments and perform user studies to demonstrate the efficacy of our method in generating images with high-quality hands.
Related papers
- Hand1000: Generating Realistic Hands from Text with Only 1,000 Images [29.562925199318197]
We propose a novel approach named Hand1000 that enables the generation of realistic hand images with target gesture.
The training of Hand1000 is divided into three stages with the first stage aiming to enhance the model's understanding of hand anatomy.
We construct the first publicly available dataset specifically designed for text-to-hand image generation.
arXiv Detail & Related papers (2024-08-28T00:54:51Z) - RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance [41.213241942526935]
diffusion models can generate high-quality human images, but their applications are limited by the instability in generating hands with correct structures.
We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance.
The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods.
arXiv Detail & Related papers (2024-04-22T08:44:34Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation [29.79050316749927]
We introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands.
A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way.
Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation.
arXiv Detail & Related papers (2024-03-15T23:31:41Z) - Annotated Hands for Generative Models [17.494997005870754]
Generative models such as GANs and diffusion models have demonstrated impressive image generation capabilities.
We propose a novel training framework for generative models that substantially improves the ability of such systems to create hand images.
arXiv Detail & Related papers (2024-01-26T18:57:54Z) - BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics [50.88842027976421]
We propose BOTH57M, a novel multi-modal dataset for two-hand motion generation.
Our dataset includes accurate motion tracking for the human body and hands.
We also provide a strong baseline method, BOTH2Hands, for the novel task.
arXiv Detail & Related papers (2023-12-13T07:30:19Z) - HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images.
But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes.
This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Im2Hands: Learning Attentive Implicit Representation of Interacting
Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands.
Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency.
We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.