HanDrawer: Leveraging Spatial Information to Render Realistic Hands Using a Conditional Diffusion Model in Single Stage
- URL: http://arxiv.org/abs/2503.02127v1
- Date: Mon, 03 Mar 2025 23:29:33 GMT
- Title: HanDrawer: Leveraging Spatial Information to Render Realistic Hands Using a Conditional Diffusion Model in Single Stage
- Authors: Qifan Fu, Xu Chen, Muhammad Asad, Shanxin Yuan, Changjae Oh, Gregory Slabaugh,
- Abstract summary: We propose HanDrawer, a module to condition the hand generation process.<n>The spatially fused features are used to guide a single stage diffusion model denoising process.<n>HanDrawer learns the entire image features while paying special attention to the hand region.
- Score: 16.890823951821396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although diffusion methods excel in text-to-image generation, generating accurate hand gestures remains a major challenge, resulting in severe artifacts, such as incorrect number of fingers or unnatural gestures. To enable the diffusion model to learn spatial information to improve the quality of the hands generated, we propose HanDrawer, a module to condition the hand generation process. Specifically, we apply graph convolutional layers to extract the endogenous spatial structure and physical constraints implicit in MANO hand mesh vertices. We then align and fuse these spatial features with other modalities via cross-attention. The spatially fused features are used to guide a single stage diffusion model denoising process for high quality generation of the hand region. To improve the accuracy of spatial feature fusion, we propose a Position-Preserving Zero Padding (PPZP) fusion strategy, which ensures that the features extracted by HanDrawer are fused into the region of interest in the relevant layers of the diffusion model. HanDrawer learns the entire image features while paying special attention to the hand region thanks to an additional hand reconstruction loss combined with the denoising loss. To accurately train and evaluate our approach, we perform careful cleansing and relabeling of the widely used HaGRID hand gesture dataset and obtain high quality multimodal data. Quantitative and qualitative analyses demonstrate the state-of-the-art performance of our method on the HaGRID dataset through multiple evaluation metrics. Source code and our enhanced dataset will be released publicly if the paper is accepted.
Related papers
- UniHands: Unifying Various Wild-Collected Keypoints for Personalized Hand Reconstruction [4.0025708029346445]
We present UniHands, a novel method for creating standardized yet personalized hand models.
Unlike existing neural implicit representation methods, UniHands uses the widely-adopted parametric models MANO and NIMBLE.
It also derives unified hand joints from the meshes, which facilitates seamless integration into various hand-related tasks.
arXiv Detail & Related papers (2024-11-18T18:59:58Z) - Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss [12.565642618427844]
Diffusion models can synthesize images, including the generation of humans in specific poses.
Current models face challenges in adequately expressing conditional control for detailed hand pose generation.
We propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region.
arXiv Detail & Related papers (2024-09-13T19:09:19Z) - XHand: Real-time Expressive Hand Avatar [9.876680405587745]
We introduce an expressive hand avatar, named XHand, that is designed to generate hand shape, appearance, and deformations in real-time.
XHand is able to recover high-fidelity geometry and texture for hand animations across diverse poses in real-time.
arXiv Detail & Related papers (2024-07-30T17:49:21Z) - Fine-Grained Multi-View Hand Reconstruction Using Inverse Rendering [11.228453237603834]
We present a novel fine-grained multi-view hand mesh reconstruction method that leverages inverse rendering to restore hand poses and intricate details.
We also introduce a novel Hand Albedo and Mesh (HAM) optimization module to refine both the hand mesh and textures.
Our proposed approach outperforms the state-of-the-art methods on both reconstruction accuracy and rendering quality.
arXiv Detail & Related papers (2024-07-08T07:28:24Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands.
Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations.
We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z) - SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form
Layout-to-Image Generation [68.42476385214785]
We propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance.
SSMG achieves superior generation quality with sufficient spatial and semantic controllability compared to previous works.
We also propose the Relation-Sensitive Attention (RSA) and Location-Sensitive Attention (LSA) mechanisms.
arXiv Detail & Related papers (2023-08-20T04:09:12Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - Sketch-Guided Text-to-Image Diffusion Models [57.12095262189362]
We introduce a universal approach to guide a pretrained text-to-image diffusion model.
Our method does not require to train a dedicated model or a specialized encoder for the task.
We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images.
arXiv Detail & Related papers (2022-11-24T18:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.