Evaluating and Predicting Distorted Human Body Parts for Generated Images
- URL: http://arxiv.org/abs/2503.00811v1
- Date: Sun, 02 Mar 2025 09:34:44 GMT
- Title: Evaluating and Predicting Distorted Human Body Parts for Generated Images
- Authors: Lu Ma, Kaibo Cao, Hao Liang, Jiaxin Lin, Zhuang Li, Yuhong Liu, Jihong Zhang, Wentao Zhang, Bin Cui,
- Abstract summary: We propose ViT-HD, a Vision Transformer-based model tailored for detecting human body distortions in AI-generated images.<n>We construct the Human Distortion Benchmark with 500 human-centric prompts to evaluate four popular T2I models.<n>This work pioneers a systematic approach to evaluating anatomical accuracy in AI-generated humans, offering tools to advance the fidelity of T2I models.
- Score: 44.49888268318722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in text-to-image (T2I) models enable high-quality image synthesis, yet generating anatomically accurate human figures remains challenging. AI-generated images frequently exhibit distortions such as proliferated limbs, missing fingers, deformed extremities, or fused body parts. Existing evaluation metrics like Inception Score (IS) and Fr\'echet Inception Distance (FID) lack the granularity to detect these distortions, while human preference-based metrics focus on abstract quality assessments rather than anatomical fidelity. To address this gap, we establish the first standards for identifying human body distortions in AI-generated images and introduce Distortion-5K, a comprehensive dataset comprising 4,700 annotated images of normal and malformed human figures across diverse styles and distortion types. Based on this dataset, we propose ViT-HD, a Vision Transformer-based model tailored for detecting human body distortions in AI-generated images, which outperforms state-of-the-art segmentation models and visual language models, achieving an F1 score of 0.899 and IoU of 0.831 on distortion localization. Additionally, we construct the Human Distortion Benchmark with 500 human-centric prompts to evaluate four popular T2I models using trained ViT-HD, revealing that nearly 50\% of generated images contain distortions. This work pioneers a systematic approach to evaluating anatomical accuracy in AI-generated humans, offering tools to advance the fidelity of T2I models and their real-world applicability. The Distortion-5K dataset, trained ViT-HD will soon be released in our GitHub repository: \href{https://github.com/TheRoadQaQ/Predicting-Distortion}{https://github.com/TheRoadQaQ/Predicting-Distortion}.
Related papers
- Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images [70.49595920462579]
This work studies the quality assessment and distortion-aware saliency prediction problems for AIGODIs.<n>We propose two models with shared encoders based on the BLIP-2 model to evaluate the human visual experience and predict distortion-aware saliency for AI-generated omnidirectional images.
arXiv Detail & Related papers (2025-06-27T05:36:04Z) - AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images [58.87047247313503]
We introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of human images (AGHIs)
The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state-of-the-art T2I models.
We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels.
arXiv Detail & Related papers (2025-04-30T04:36:56Z) - Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline.<n>This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images.<n>We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z) - Enhancing Early Diabetic Retinopathy Detection through Synthetic DR1 Image Generation: A StyleGAN3 Approach [0.0]
This study uses StyleGAN3 to generate synthetic DR1 images characterized by microaneurysms with high fidelity and diversity.<n>A dataset of 2,602 DR1 images was used to train the model, followed by a comprehensive evaluation using quantitative metrics.<n>The model achieved a final FID score of 17.29, outperforming the mean FID of 21.18 (95 percent confidence interval - 20.83 to 21.56) derived from bootstrap resampling.
arXiv Detail & Related papers (2025-01-01T21:00:58Z) - ANID: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance [19.760989919485894]
We introduce an AI-Natural Image Discrepancy Evaluation benchmark aimed at addressing the critical question: textithow far are AI-generated images from truly realistic images?<n>We have constructed a large-scale multimodal dataset, the Distinguishing Natural and AI-generated Images (DNAI) dataset, which includes over 440,000 AIGI samples generated by 8 representative models.<n>Our fine-grained assessment framework provides a comprehensive evaluation of the DNAI dataset across five key dimensions.
arXiv Detail & Related papers (2024-12-23T15:08:08Z) - Detecting Human Artifacts from Text-to-Image Models [16.261759535724778]
This dataset contains images containing images containing images containing a human body.
Images include images of poorly generated human bodies, including distorted and missing parts of the human body.
arXiv Detail & Related papers (2024-11-21T05:02:13Z) - HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies.
HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z) - Generative Model-Driven Synthetic Training Image Generation: An Approach
to Cognition in Rail Defect Detection [12.584718477246382]
This study proposes a VAE-based synthetic image generation technique for rail defects.
It is applied to create a synthetic dataset for the Canadian Pacific Railway.
500 synthetic samples are generated with a minimal reconstruction loss of 0.021.
arXiv Detail & Related papers (2023-12-31T04:34:58Z) - Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images.
We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body.
We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.