Related papers: RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

URL: http://arxiv.org/abs/2404.13984v1
Date: Mon, 22 Apr 2024 08:44:34 GMT
Title: RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng,
Abstract summary: diffusion models can generate high-quality human images, but their applications are limited by the instability in generating hands with correct structures. We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance. The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods.
Score: 41.213241942526935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regarding hand structure and style. We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance. Specifically, the structure guidance is the hand mesh reconstructed from the malformed hand, serving to correct the hand structure. The style guidance is a hand image, e.g., the malformed hand itself, and is employed to furnish the style reference for hand refining. In order to suppress the structure leakage when referencing hand style and effectively utilize hand data to improve the capability of the model, we build a multi-style hand dataset and introduce a twostage training strategy. In the first stage, we use paired hand images for training to generate hands with the same style as the reference. In the second stage, various hand images generated based on the human mesh are used for training to enable the model to gain control over the hand structure. We evaluate our method and counterparts on the test dataset of the proposed multi-style hand dataset. The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods. The codes and datasets will be available soon.

Related papers

SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment [38.103458669002684]
Generative models are promising alternatives to generate diverse hand images, but still suffer from misalignment issues.<n>We present SesaHand, which enhances controllable hand image generation from both semantic and structural alignment perspectives.<n> Experiments demonstrate that our method not only outperforms prior work in generation performance but also improves 3D hand reconstruction with the generated hand images.
arXiv Detail & Related papers (2026-02-28T03:51:51Z)
3D Hand Mesh-Guided AI-Generated Malformed Hand Refinement with Hand Pose Transformation via Diffusion Model [40.20849519857311]
We propose a 3D mesh-guided refinement framework using a diffusion pipeline.<n>For training, we collect and reannotate a dataset consisting of RGB images and 3D hand mesh.<n>We then design a diffusion inpainting model to generate refined outputs guided by 3D hand meshes.
arXiv Detail & Related papers (2025-06-15T01:30:22Z)
Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures and occlusions. Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts. We propose a novel framework that attempts to precisely align hand poses and interactions by integrating foundation model-driven 2D priors with diffusion-based interaction refinement.
arXiv Detail & Related papers (2025-03-22T14:42:27Z)
MGHanD: Multi-modal Guidance for authentic Hand Diffusion [25.887930576638293]
MGHanD addresses persistent challenges in generating realistic human hands. We employ a discriminator trained on a dataset comprising paired real and generated images with captions. We also employ textual guidance with LoRA adapter, which learns the direction from hands' towards more detailed prompts.
arXiv Detail & Related papers (2025-03-11T07:51:47Z)
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images [20.81706200561224]
We propose a method HandCraft for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals. Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models.
arXiv Detail & Related papers (2024-11-07T00:14:39Z)
Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation [29.79050316749927]
We introduce a novel approach to pose-conditioned human image generation, dividing the process into two stages: hand generation and subsequent body outpainting around the hands. A novel blending technique is introduced to preserve the hand details during the second stage that combines the results of both stages in a coherent way. Our approach not only enhances the quality of the generated hands but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation.
arXiv Detail & Related papers (2024-03-15T23:31:41Z)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z)
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images. But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z)
Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction [62.96478903239799]
Direct mesh fitting for 3D hand shape reconstruction is highly accurate. However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes. We introduce a novel weakly-supervised hand shape estimation framework that integrates non-parametric mesh fitting with MANO model in an end-to-end fashion.
arXiv Detail & Related papers (2023-05-01T03:38:01Z)
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z)
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z)
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands. Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z)
Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model. Our method does not require to train a dedicated model or a specialized encoder for the task. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.