Related papers: HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

URL: http://arxiv.org/abs/2311.17957v2
Date: Fri, 16 Aug 2024 05:35:21 GMT
Title: HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
Authors: Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, Dacheng Tao,
Abstract summary: Diffusion models have achieved remarkable success in generating realistic images. But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This paper introduces a lightweight post-processing solution called HandRefiner.
Score: 72.95232302438207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have achieved remarkable success in generating realistic images but suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This difficulty arises from the complex task of learning the physical structure and pose of hands from training images, which involves extensive deformations and occlusions. For correct hand generation, our paper introduces a lightweight post-processing solution called $\textbf{HandRefiner}$. HandRefiner employs a conditional inpainting approach to rectify malformed hands while leaving other parts of the image untouched. We leverage the hand mesh reconstruction model that consistently adheres to the correct number of fingers and hand shape, while also being capable of fitting the desired hand pose in the generated image. Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information. Additionally, we uncover a phase transition phenomenon within ControlNet as we vary the control strength. It enables us to take advantage of more readily available synthetic data without suffering from the domain gap between realistic and synthetic hands. Experiments demonstrate that HandRefiner can significantly improve the generation quality quantitatively and qualitatively. The code is available at https://github.com/wenquanlu/HandRefiner .

Related papers

FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation [11.843140646170458]
We present FoundHand, a large-scale domain-specific diffusion model for single and dual hand images. We use FoundHand-10M, a large-scale hand dataset with 2D keypoints and segmentation mask annotations. Our model exhibits core capabilities that include the ability to repose hands, transfer hand appearance, and even synthesize novel views.
arXiv Detail & Related papers (2024-12-03T18:58:19Z)
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images [20.81706200561224]
We propose a method HandCraft for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals. Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models.
arXiv Detail & Related papers (2024-11-07T00:14:39Z)
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance [41.213241942526935]
diffusion models can generate high-quality human images, but their applications are limited by the instability in generating hands with correct structures. We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance. The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods.
arXiv Detail & Related papers (2024-04-22T08:44:34Z)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications. This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds. Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z)
Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation [29.79050316749927]
We introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands. A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way. Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation.
arXiv Detail & Related papers (2024-03-15T23:31:41Z)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z)
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z)
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z)
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands. Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z)
Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video. Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses. We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.