HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
- URL: http://arxiv.org/abs/2407.06937v1
- Date: Tue, 9 Jul 2024 15:14:41 GMT
- Title: HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
- Authors: Guian Fang, Wenbiao Yan, Yuanfan Guo, Jianhua Han, Zutao Jiang, Hang Xu, Shengcai Liao, Xiaodan Liang,
- Abstract summary: AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies.
HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
- Score: 80.97360194728705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the first large-scale synthesized human benchmark focusing on anatomical anomalies. This benchmark consists of 56K synthesized human images, each annotated with detailed, bounding-box level labels identifying 147K human anomalies in 18 different categories. Based on this, the recognition of human anomalies can be established, which in turn enhances image generation through traditional techniques such as negative prompting and guidance. To further boost the improvement, we propose HumanRefiner, a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation. Specifically, HumanRefiner utilizes a self-diagnostic procedure to detect and correct issues related to both coarse-grained abnormal human poses and fine-grained anomaly levels, facilitating pose-reversible diffusion generation. Experimental results on the AbHuman benchmark demonstrate that HumanRefiner significantly reduces generative discrepancies, achieving a 2.9x improvement in limb quality compared to the state-of-the-art open-source generator SDXL and a 1.4x improvement over DALL-E 3 in human evaluations. Our data and code are available at https://github.com/Enderfga/HumanRefiner.
Related papers
- MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands.
We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts.
This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z) - Towards Automation of Human Stage of Decay Identification: An Artificial Intelligence Approach [3.2048813174244795]
This study explores the feasibility of automating two common human decomposition scoring methods using artificial intelligence (AI)
We evaluated two popular deep learning models, Inception V3 and Xception, by training them on a large dataset of human decomposition images.
The Xception model achieved the best classification performance, with macro-averaged F1 scores of.878,.881, and.702 for the head, torso, and limbs.
arXiv Detail & Related papers (2024-08-19T21:00:40Z) - Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM)
Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians.
To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Towards Effective Usage of Human-Centric Priors in Diffusion Models for
Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images.
Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls.
This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z) - Towards the Detection of AI-Synthesized Human Face Images [12.090322373964124]
This paper presents a benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs.
Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights.
The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.
arXiv Detail & Related papers (2024-02-13T19:37:44Z) - HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Diffusion Models as Artists: Are we Closing the Gap between Humans and
Machines? [4.802758600019422]
We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022.
We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
arXiv Detail & Related papers (2023-01-27T14:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.