3D Hand Mesh-Guided AI-Generated Malformed Hand Refinement with Hand Pose Transformation via Diffusion Model
- URL: http://arxiv.org/abs/2506.12680v2
- Date: Tue, 17 Jun 2025 02:48:47 GMT
- Title: 3D Hand Mesh-Guided AI-Generated Malformed Hand Refinement with Hand Pose Transformation via Diffusion Model
- Authors: Chen-Bin Feng, Kangdao Liu, Jian Sun, Jiping Jin, Yiguo Jiang, Chi-Man Vong,
- Abstract summary: We propose a 3D mesh-guided refinement framework using a diffusion pipeline.<n>For training, we collect and reannotate a dataset consisting of RGB images and 3D hand mesh.<n>We then design a diffusion inpainting model to generate refined outputs guided by 3D hand meshes.
- Score: 40.20849519857311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The malformed hands in the AI-generated images seriously affect the authenticity of the images. To refine malformed hands, existing depth-based approaches use a hand depth estimator to guide the refinement of malformed hands. Due to the performance limitations of the hand depth estimator, many hand details cannot be represented, resulting in errors in the generated hands, such as confusing the palm and the back of the hand. To solve this problem, we propose a 3D mesh-guided refinement framework using a diffusion pipeline. We use a state-of-the-art 3D hand mesh estimator, which provides more details of the hands. For training, we collect and reannotate a dataset consisting of RGB images and 3D hand mesh. Then we design a diffusion inpainting model to generate refined outputs guided by 3D hand meshes. For inference, we propose a double check algorithm to facilitate the 3D hand mesh estimator to obtain robust hand mesh guidance to obtain our refined results. Beyond malformed hand refinement, we propose a novel hand pose transformation method. It increases the flexibility and diversity of the malformed hand refinement task. We made the restored images mimic the hand poses of the reference images. The pose transformation requires no additional training. Extensive experimental results demonstrate the superior performance of our proposed method.
Related papers
- HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images [20.81706200561224]
We propose a method HandCraft for restoring such malformed hands.<n>This is achieved by automatically constructing masks and depth images for hands as conditioning signals.<n>Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models.
arXiv Detail & Related papers (2024-11-07T00:14:39Z) - AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild [18.351368674337134]
AttentionHand is a novel method for text-driven controllable hand image generation.
It can generate various and numerous in-the-wild hand images well-aligned with 3D hand label.
It achieves state-of-the-art among text-to-hand image generation models.
arXiv Detail & Related papers (2024-07-25T13:29:32Z) - HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud [60.47544798202017]
Hand pose estimation is a critical task in various human-computer interaction applications.
This paper proposes HandDiff, a diffusion-based hand pose estimation model that iteratively denoises accurate hand pose conditioned on hand-shaped image-point clouds.
Experimental results demonstrate that the proposed HandDiff significantly outperforms the existing approaches on four challenging hand pose benchmark datasets.
arXiv Detail & Related papers (2024-04-04T02:15:16Z) - HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting [72.95232302438207]
Diffusion models have achieved remarkable success in generating realistic images.
But they suffer from generating accurate human hands, such as incorrect finger counts or irregular shapes.
This paper introduces a lightweight post-processing solution called HandRefiner.
arXiv Detail & Related papers (2023-11-29T08:52:08Z) - Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation [59.3035531612715]
Existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred.
In videos, the movements of the hand allow us to observe various parts of the hand that may be occluded or blurred in a single frame.
We propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image.
arXiv Detail & Related papers (2023-03-09T02:24:30Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - MM-Hand: 3D-Aware Multi-Modal Guided Hand Generative Network for 3D Hand
Pose Synthesis [81.40640219844197]
Estimating the 3D hand pose from a monocular RGB image is important but challenging.
A solution is training on large-scale RGB hand images with accurate 3D hand keypoint annotations.
We have developed a learning-based approach to synthesize realistic, diverse, and 3D pose-preserving hand images.
arXiv Detail & Related papers (2020-10-02T18:27:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.