Related papers: HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

URL: http://arxiv.org/abs/2407.06937v1
Date: Tue, 9 Jul 2024 15:14:41 GMT
Title: HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Authors: Guian Fang, Wenbiao Yan, Yuanfan Guo, Jianhua Han, Zutao Jiang, Hang Xu, Shengcai Liao, Xiaodan Liang,
Abstract summary: AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies. HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
Score: 80.97360194728705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the first large-scale synthesized human benchmark focusing on anatomical anomalies. This benchmark consists of 56K synthesized human images, each annotated with detailed, bounding-box level labels identifying 147K human anomalies in 18 different categories. Based on this, the recognition of human anomalies can be established, which in turn enhances image generation through traditional techniques such as negative prompting and guidance. To further boost the improvement, we propose HumanRefiner, a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation. Specifically, HumanRefiner utilizes a self-diagnostic procedure to detect and correct issues related to both coarse-grained abnormal human poses and fine-grained anomaly levels, facilitating pose-reversible diffusion generation. Experimental results on the AbHuman benchmark demonstrate that HumanRefiner significantly reduces generative discrepancies, achieving a 2.9x improvement in limb quality compared to the state-of-the-art open-source generator SDXL and a 1.4x improvement over DALL-E 3 in human evaluations. Our data and code are available at https://github.com/Enderfga/HumanRefiner.

Related papers

Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline. This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images. We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z)
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. GeneMAN builds upon a comprehensive collection of high-quality human data. GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z)
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body [40.77110649866136]
Existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures. This paper introduces a simple yet challenging task, i.e., textbfFine-grained textbfHuman-body textbfAbnormality textbfDetection textbf(D). We propose a framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content.
arXiv Detail & Related papers (2024-11-21T15:13:38Z)
Detecting Human Artifacts from Text-to-Image Models [16.261759535724778]
This dataset contains images containing images containing images containing a human body. Images include images of poorly generated human bodies, including distorted and missing parts of the human body.
arXiv Detail & Related papers (2024-11-21T05:02:13Z)
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts [61.274246025372044]
We study human-centric text-to-image generation in context of faces and hands. We propose a method called Mixture of Low-rank Experts (MoLE) by considering low-rank modules trained on close-up hand and face images respectively as experts. This concept draws inspiration from our observation of low-rank refinement, where a low-rank module trained by a customized close-up dataset has the potential to enhance the corresponding image part when applied at an appropriate scale.
arXiv Detail & Related papers (2024-10-30T17:59:57Z)
Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM) Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step. We first employ a human token to probe a human location in the image and encode global features for each instance. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z)
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls. This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z)
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input. To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling. Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z)
Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z)
Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines? [4.802758600019422]
We adapt the 'diversity vs. recognizability' scoring framework from Boutin et al, 2022. We find that one-shot diffusion models have indeed started to close the gap between humans and machines.
arXiv Detail & Related papers (2023-01-27T14:08:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.