HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
- URL: http://arxiv.org/abs/2411.04332v2
- Date: Mon, 11 Nov 2024 16:31:24 GMT
- Title: HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images
- Authors: Zhenyue Qin, Yiqun Zhang, Yang Liu, Dylan Campbell,
- Abstract summary: We propose a method HandCraft for restoring such malformed hands.
This is achieved by automatically constructing masks and depth images for hands as conditioning signals.
Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models.
- Score: 20.81706200561224
- License:
- Abstract: Generative text-to-image models, such as Stable Diffusion, have demonstrated a remarkable ability to generate diverse, high-quality images. However, they are surprisingly inept when it comes to rendering human hands, which are often anatomically incorrect or reside in the "uncanny valley". In this paper, we propose a method HandCraft for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals using a parametric model, allowing a diffusion-based image editor to fix the hand's anatomy and adjust its pose while seamlessly integrating the changes into the original image, preserving pose, color, and style. Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models, and the restoration process facilitates adoption by eschewing any fine-tuning or training requirements for the diffusion models. We also contribute MalHand datasets that contain generated images with a wide variety of malformed hands in several styles for hand detector training and hand restoration benchmarking, and demonstrate through qualitative and quantitative evaluation that HandCraft not only restores anatomical correctness but also maintains the integrity of the overall image.
Related papers
- Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model [55.46927355649013]
We introduce a novel Multi-modal Guided Real-World Face Restoration technique.
MGFR can mitigate the generation of false facial attributes and identities.
We present the Reface-HQ dataset, comprising over 23,000 high-resolution facial images across 5,000 identities.
arXiv Detail & Related papers (2024-10-05T13:46:56Z) - RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance [41.213241942526935]
diffusion models can generate high-quality human images, but their applications are limited by the instability in generating hands with correct structures.
We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance.
The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods.
arXiv Detail & Related papers (2024-04-22T08:44:34Z) - Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation [29.79050316749927]
We introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands.
A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way.
Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation.
arXiv Detail & Related papers (2024-03-15T23:31:41Z) - HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands.
Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations.
We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z) - HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images.
But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes.
This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - ReGeneration Learning of Diffusion Models with Rich Prompts for
Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images.
Current models can impose significant changes to the original image content during the editing process.
We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.