HandEval: Taking the First Step Towards Hand Quality Evaluation in Generated Images
- URL: http://arxiv.org/abs/2510.08978v1
- Date: Fri, 10 Oct 2025 03:39:10 GMT
- Title: HandEval: Taking the First Step Towards Hand Quality Evaluation in Generated Images
- Authors: Zichuan Wang, Bo Peng, Songlin Yang, Zhenchen Tang, Jing Dong,
- Abstract summary: We develop HandEval, a hand-specific quality assessment model.<n>HandEval aligns better with human judgments than existing SOTA methods.<n>We integrate HandEval into image generation and AIGC detection pipelines.
- Score: 23.918454005000328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although recent text-to-image (T2I) models have significantly improved the overall visual quality of generated images, they still struggle in the generation of accurate details in complex local regions, especially human hands. Generated hands often exhibit structural distortions and unrealistic textures, which can be very noticeable even when the rest of the body is well-generated. However, the quality assessment of hand regions remains largely neglected, limiting downstream task performance like human-centric generation quality optimization and AIGC detection. To address this, we propose the first quality assessment task targeting generated hand regions and showcase its abundant downstream applications. We first introduce the HandPair dataset for training hand quality assessment models. It consists of 48k images formed by high- and low-quality hand pairs, enabling low-cost, efficient supervision without manual annotation. Based on it, we develop HandEval, a carefully designed hand-specific quality assessment model. It leverages the powerful visual understanding capability of Multimodal Large Language Model (MLLM) and incorporates prior knowledge of hand keypoints, gaining strong perception of hand quality. We further construct a human-annotated test set with hand images from various state-of-the-art (SOTA) T2I models to validate its quality evaluation capability. Results show that HandEval aligns better with human judgments than existing SOTA methods. Furthermore, we integrate HandEval into image generation and AIGC detection pipelines, prominently enhancing generated hand realism and detection accuracy, respectively, confirming its universal effectiveness in downstream applications. Code and dataset will be available.
Related papers
- Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content [71.46991494014382]
We introduce Q-Real, a novel dataset for fine-grained evaluation of realism and plausibility in AI-generated images.<n>Q-Real consists of 3,088 images generated by popular text-to-image models.<n>We construct Q-Real Bench to evaluate them on two tasks: judgment and grounding with reasoning.
arXiv Detail & Related papers (2025-11-21T02:43:17Z) - Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images [70.49595920462579]
This work studies the quality assessment and distortion-aware saliency prediction problems for AIGODIs.<n>We propose two models with shared encoders based on the BLIP-2 model to evaluate the human visual experience and predict distortion-aware saliency for AI-generated omnidirectional images.
arXiv Detail & Related papers (2025-06-27T05:36:04Z) - AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images [58.87047247313503]
We introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of human images (AGHIs)<n>The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state-of-the-art T2I models.<n>We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels.
arXiv Detail & Related papers (2025-04-30T04:36:56Z) - MGHanD: Multi-modal Guidance for authentic Hand Diffusion [25.887930576638293]
MGHanD addresses persistent challenges in generating realistic human hands.<n>We employ a discriminator trained on a dataset comprising paired real and generated images with captions.<n>We also employ textual guidance with LoRA adapter, which learns the direction from hands' towards more detailed prompts.
arXiv Detail & Related papers (2025-03-11T07:51:47Z) - FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation [11.843140646170458]
We present FoundHand, a large-scale domain-specific diffusion model for single and dual hand images.<n>We use FoundHand-10M, a large-scale hand dataset with 2D keypoints and segmentation mask annotations.<n>Our model exhibits core capabilities that include the ability to repose hands, transfer hand appearance, and even synthesize novel views.
arXiv Detail & Related papers (2024-12-03T18:58:19Z) - High Quality Human Image Animation using Regional Supervision and Motion Blur Condition [97.97432499053966]
We leverage regional supervision for detailed regions to enhance face and hand faithfulness.
Second, we model the motion blur explicitly to further improve the appearance quality.
Third, we explore novel training strategies for high-resolution human animation to improve the overall fidelity.
arXiv Detail & Related papers (2024-09-29T06:46:31Z) - Adaptive Multi-Modal Control of Digital Human Hand Synthesis Using a Region-Aware Cycle Loss [12.565642618427844]
Diffusion models can synthesize images, including the generation of humans in specific poses.
Current models face challenges in adequately expressing conditional control for detailed hand pose generation.
We propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region.
arXiv Detail & Related papers (2024-09-13T19:09:19Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - G-Refine: A General Quality Refiner for Text-to-Image Generation [74.16137826891827]
We introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising integrity of high-quality ones.
The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module.
Extensive experimentation reveals that AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases.
arXiv Detail & Related papers (2024-04-29T00:54:38Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.