Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation
- URL: http://arxiv.org/abs/2403.10731v2
- Date: Tue, 30 Apr 2024 10:29:22 GMT
- Title: Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation
- Authors: Anton Pelykh, Ozge Mercanoglu Sincan, Richard Bowden,
- Abstract summary: We introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands.
A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way.
Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation.
- Score: 29.79050316749927
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent years have seen significant progress in human image generation, particularly with the advancements in diffusion models. However, existing diffusion methods encounter challenges when producing consistent hand anatomy and the generated images often lack precise control over the hand pose. To address this limitation, we introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands. We propose training the hand generator in a multi-task setting to produce both hand images and their corresponding segmentation masks, and employ the trained model in the first stage of generation. An adapted ControlNet model is then used in the second stage to outpaint the body around the generated hands, producing the final result. A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way. This involves sequential expansion of the outpainted region while fusing the latent representations, to ensure a seamless and cohesive synthesis of the final image. Experimental evaluations demonstrate the superiority of our proposed method over state-of-the-art techniques, in both pose accuracy and image quality, as validated on the HaGRID dataset. Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation. The source code of the proposed approach is available at https://github.com/apelykh/hand-to-diffusion.
Related papers
- Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss [12.565642618427844]
Diffusion models can synthesize images, including the generation of humans in specific poses.
Current models face challenges in adequately expressing conditional control for detailed hand pose generation.
We propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region.
arXiv Detail & Related papers (2024-09-13T19:09:19Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands.
Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations.
We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z) - TCIG: Two-Stage Controlled Image Generation with Quality Enhancement
through Diffusion [0.0]
A two-stage method that combines controllability and high quality in the generation of images is proposed.
By separating controllability from high quality, This method achieves outstanding results.
arXiv Detail & Related papers (2024-03-02T13:59:02Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Annotated Hands for Generative Models [17.494997005870754]
Generative models such as GANs and diffusion models have demonstrated impressive image generation capabilities.
We propose a novel training framework for generative models that substantially improves the ability of such systems to create hand images.
arXiv Detail & Related papers (2024-01-26T18:57:54Z) - HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images.
But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes.
This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.