Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs
- URL: http://arxiv.org/abs/2601.10369v2
- Date: Mon, 19 Jan 2026 12:26:29 GMT
- Title: Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs
- Authors: Ningyu Sun, Zhaolin Cai, Zitong Xu, Peihang Chen, Huiyu Duan, Yichao Yan, Xiongkuo Min, Xiaokang Yang,
- Abstract summary: We introduce a benchmark comprising 1,700 standardized samples from 17 state-of-the-art editing models.<n>We propose a unified framework based on layer-selective multimodal large language models (MLLMs)<n>Our framework achieves superior performance in both authenticity detection and multi-dimensional quality regression.
- Score: 70.31435391393642
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Text-guided human pose editing has gained significant traction in AIGC applications. However,it remains plagued by structural anomalies and generative artifacts. Existing evaluation metrics often isolate authenticity detection from quality assessment, failing to provide fine-grained insights into pose-specific inconsistencies. To address these limitations, we introduce HPE-Bench, a specialized benchmark comprising 1,700 standardized samples from 17 state-of-the-art editing models, offering both authenticity labels and multi-dimensional quality scores. Furthermore, we propose a unified framework based on layer-selective multimodal large language models (MLLMs). By employing contrastive LoRA tuning and a novel layer sensitivity analysis (LSA) mechanism, we identify the optimal feature layer for pose evaluation. Our framework achieves superior performance in both authenticity detection and multi-dimensional quality regression, effectively bridging the gap between forensic detection and quality assessment.
Related papers
- Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content [71.46991494014382]
We introduce Q-Real, a novel dataset for fine-grained evaluation of realism and plausibility in AI-generated images.<n>Q-Real consists of 3,088 images generated by popular text-to-image models.<n>We construct Q-Real Bench to evaluate them on two tasks: judgment and grounding with reasoning.
arXiv Detail & Related papers (2025-11-21T02:43:17Z) - Selective Adversarial Attacks on LLM Benchmarks [1.6307653659652344]
We study selective adversarial attacks on the widely used benchmark MMLU.<n>We find that selective adversarial attacks exist and can materially alter relative rankings.<n>Our results motivate perturbation-aware reporting and robustness evaluation.
arXiv Detail & Related papers (2025-10-15T14:08:44Z) - Span-level Detection of AI-generated Scientific Text via Contrastive Learning and Structural Calibration [2.105564340986074]
Sci-SpanDet is a structure-aware framework for detecting AI-generated scholarly texts.<n>It combines section-conditioned stylistic modeling with multi-level contrastive learning to capture human nuanced-AI differences.<n>It achieves state-of-the-art performance, with F1(AI) of 80.17, AUROC of 92.63, and Span-F1 of 74.36.
arXiv Detail & Related papers (2025-10-01T13:35:14Z) - Expert Preference-based Evaluation of Automated Related Work Generation [54.29459509574242]
We propose GREP, a multi-turn evaluation framework that integrates classical related work evaluation criteria with expert-specific preferences.<n>For better accessibility, we design two variants of GREP: a more precise variant with proprietary LLMs as evaluators, and a cheaper alternative with open-weight LLMs.
arXiv Detail & Related papers (2025-08-11T13:08:07Z) - Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection [53.137651284042434]
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples limits the effectiveness of existing methods.<n>We propose Generate grained Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework.<n>GAA generates realistic, diverse, and semantically aligned anomalies using only a small number of samples.
arXiv Detail & Related papers (2025-07-13T12:56:59Z) - AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images [58.87047247313503]
We introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of human images (AGHIs)<n>The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state-of-the-art T2I models.<n>We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels.
arXiv Detail & Related papers (2025-04-30T04:36:56Z) - Towards Explainable Partial-AIGC Image Quality Assessment [51.42831861127991]
Despite extensive research on image quality assessment (IQA) for AI-generated images (AGIs), most studies focus on fully AI-generated outputs.<n>We construct the first large-scale PAI dataset towards explainable partial-AIGC image quality assessment (EPAIQA)<n>Our work represents a pioneering effort in the perceptual IQA field for comprehensive PAI quality assessment.
arXiv Detail & Related papers (2025-04-12T17:27:50Z) - M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
arXiv Detail & Related papers (2025-02-21T03:05:45Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - TISE: A Toolbox for Text-to-Image Synthesis Evaluation [9.092600296992925]
We conduct a study on state-of-the-art methods for single- and multi-object text-to-image synthesis.
We propose a common framework for evaluating these methods.
arXiv Detail & Related papers (2021-12-02T16:39:35Z) - Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face
Recognition in Unconstrained Environments [8.368543987898732]
Real-world face recognition applications often deal with suboptimal image quality or resolution due to different capturing conditions.
Recent cross-resolution face recognition approaches used simple, arbitrary, and unrealistic down- and up-scaling techniques to measure distances against real-world edge-cases in image quality.
We propose a new standardized benchmark dataset and evaluation protocol derived from the famous Labeled Faces in the Wild.
arXiv Detail & Related papers (2021-08-23T17:04:32Z) - Generating Adversarial Examples with an Optimized Quality [12.747258403133035]
Deep learning models are vulnerable to Adversarial Examples (AEs),carefully crafted samples to deceive those models.
Recent studies have introduced new adversarial attack methods, but none provided guaranteed quality for the crafted examples.
In this paper, we incorporateImage Quality Assessment (IQA) metrics into the design and generation process of AEs.
arXiv Detail & Related papers (2020-06-30T23:05:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.