Related papers: FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation

FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation

URL: http://arxiv.org/abs/2412.02690v2
Date: Wed, 04 Dec 2024 20:51:17 GMT
Title: FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Authors: Kefan Chen, Chaerin Min, Linguang Zhang, Shreyas Hampali, Cem Keskin, Srinath Sridhar,
Abstract summary: We present FoundHand, a large-scale domain-specific diffusion model for single and dual hand images.<n>We use FoundHand-10M, a large-scale hand dataset with 2D keypoints and segmentation mask annotations.<n>Our model exhibits core capabilities that include the ability to repose hands, transfer hand appearance, and even synthesize novel views.
Score: 11.843140646170458
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite remarkable progress in image generation models, generating realistic hands remains a persistent challenge due to their complex articulation, varying viewpoints, and frequent occlusions. We present FoundHand, a large-scale domain-specific diffusion model for synthesizing single and dual hand images. To train our model, we introduce FoundHand-10M, a large-scale hand dataset with 2D keypoints and segmentation mask annotations. Our insight is to use 2D hand keypoints as a universal representation that encodes both hand articulation and camera viewpoint. FoundHand learns from image pairs to capture physically plausible hand articulations, natively enables precise control through 2D keypoints, and supports appearance control. Our model exhibits core capabilities that include the ability to repose hands, transfer hand appearance, and even synthesize novel views. This leads to zero-shot capabilities for fixing malformed hands in previously generated images, or synthesizing hand video sequences. We present extensive experiments and evaluations that demonstrate state-of-the-art performance of our method.

Related papers

Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss [12.565642618427844]
Diffusion models can synthesize images, including the generation of humans in specific poses. Current models face challenges in adequately expressing conditional control for detailed hand pose generation. We propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region.
arXiv Detail & Related papers (2024-09-13T19:09:19Z)
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications. This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds. Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z)
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances [34.50137847908887]
Text-to-image generative models can generate high-quality humans, but realism is lost when generating hands. Common artifacts include irregular hand poses, shapes, incorrect numbers of fingers, and physically implausible finger orientations. We propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting hand embeddings in the generative process.
arXiv Detail & Related papers (2024-03-04T03:00:22Z)
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images. But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes. This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z)
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z)
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame. We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z)
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes [58.551154822792284]
Implicit Two Hands (Im2Hands) is the first neural implicit representation of two interacting hands. Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods.
arXiv Detail & Related papers (2023-02-28T06:38:25Z)
MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging. A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations. We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.