Related papers: HPSv3: Towards Wide-Spectrum Human Preference Score

HPSv3: Towards Wide-Spectrum Human Preference Score

URL: http://arxiv.org/abs/2508.03789v1
Date: Tue, 05 Aug 2025 17:17:13 GMT
Title: HPSv3: Towards Wide-Spectrum Human Preference Score
Authors: Yuhang Ma, Xiaoshi Wu, Keqiang Sun, Hongsheng Li,
Abstract summary: We release the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons.<n>We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking.<n>Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data.
Score: 35.108959799842694
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evaluating text-to-image generation models requires alignment with human perception, yet existing human-centric metrics are constrained by limited data coverage, suboptimal feature extraction, and inefficient loss functions. To address these challenges, we introduce Human Preference Score v3 (HPSv3). (1) We release HPDv3, the first wide-spectrum human preference dataset integrating 1.08M text-image pairs and 1.17M annotated pairwise comparisons from state-of-the-art generative models and low to high-quality real-world images. (2) We introduce a VLM-based preference model trained using an uncertainty-aware ranking loss for fine-grained ranking. Besides, we propose Chain-of-Human-Preference (CoHP), an iterative image refinement method that enhances quality without extra data, using HPSv3 to select the best image at each step. Extensive experiments demonstrate that HPSv3 serves as a robust metric for wide-spectrum image evaluation, and CoHP offers an efficient and human-aligned approach to improve image generation quality. The code and dataset are available at the HPSv3 Homepage.

Related papers

Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline.<n>This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images.<n>We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z)
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models [85.30735602813093]
Multi-Image Augmented Direct Preference Optimization (MIA-DPO) is a visual preference alignment approach that effectively handles multi-image inputs. MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats.
arXiv Detail & Related papers (2024-10-23T07:56:48Z)
G-Refine: A General Quality Refiner for Text-to-Image Generation [74.16137826891827]
We introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Extensive experimentation reveals that AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases.
arXiv Detail & Related papers (2024-04-29T00:54:38Z)
Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [65.91490997921859]
We propose an Uncertainty-Aware testing-time Optimization (UAO) framework for 3D human pose estimation.<n>The framework keeps the prior information of the pre-trained model and alleviates the overfitting problem using the uncertainty of joints.<n>Our approach outperforms the previous best result by a large margin of 5.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z)
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space [43.368963897752664]
This work introduces a novel paradigm to address the Human Pose and Shape Estimation problem. Instead of predicting body model parameters, we focus on predicting the proposed discrete latent representation. The proposed model, VQ-HPS, predicts the discrete latent representation of the mesh.
arXiv Detail & Related papers (2023-12-13T17:08:38Z)
Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z)
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation [29.037799937729687]
Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. We propose textbfZero-shot textbfDiffusion-based textbfOptimization (textbfZeDO) pipeline for 3D HPE. Our multi-hypothesis textittextbfZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$
arXiv Detail & Related papers (2023-07-07T21:03:18Z)
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis [38.70605308204128]
Recent text-to-image generative models can generate high-fidelity images from text inputs. HPD v2 captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images.
arXiv Detail & Related papers (2023-06-15T17:59:31Z)
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference [41.270068272447055]
We collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. We propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences.
arXiv Detail & Related papers (2023-03-25T10:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.