F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
- URL: http://arxiv.org/abs/2412.13155v1
- Date: Tue, 17 Dec 2024 18:28:48 GMT
- Title: F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
- Authors: Lu Liu, Huiyu Duan, Qiang Hu, Liu Yang, Chunlei Cai, Tianxiao Ye, Huayu Liu, Xiaoyun Zhang, Guangtao Zhai,
- Abstract summary: We introduce FaceQ, a large-scale, comprehensive database of AI-generated Face images with fine-grained Quality annotations reflecting human preferences.
The FaceQ database comprises 12,255 images generated by 29 models across three tasks: (1) face generation, (2) face customization, and (3) face restoration.
It includes 32,742 mean opinion scores (MOSs) from 180 annotators, assessed across multiple dimensions: quality, authenticity, identity (ID) fidelity, and text-image correspondence.
- Score: 44.51237819287311
- License:
- Abstract: Artificial intelligence generative models exhibit remarkable capabilities in content creation, particularly in face image generation, customization, and restoration. However, current AI-generated faces (AIGFs) often fall short of human preferences due to unique distortions, unrealistic details, and unexpected identity shifts, underscoring the need for a comprehensive quality evaluation framework for AIGFs. To address this need, we introduce FaceQ, a large-scale, comprehensive database of AI-generated Face images with fine-grained Quality annotations reflecting human preferences. The FaceQ database comprises 12,255 images generated by 29 models across three tasks: (1) face generation, (2) face customization, and (3) face restoration. It includes 32,742 mean opinion scores (MOSs) from 180 annotators, assessed across multiple dimensions: quality, authenticity, identity (ID) fidelity, and text-image correspondence. Using the FaceQ database, we establish F-Bench, a benchmark for comparing and evaluating face generation, customization, and restoration models, highlighting strengths and weaknesses across various prompts and evaluation dimensions. Additionally, we assess the performance of existing image quality assessment (IQA), face quality assessment (FQA), AI-generated content image quality assessment (AIGCIQA), and preference evaluation metrics, manifesting that these standard metrics are relatively ineffective in evaluating authenticity, ID fidelity, and text-image correspondence. The FaceQ database will be publicly available upon publication.
Related papers
- Rank-based No-reference Quality Assessment for Face Swapping [88.53827937914038]
The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image.
We present a novel no-reference image quality assessment (NR-IQA) method specifically designed for face swapping.
arXiv Detail & Related papers (2024-06-04T01:36:29Z) - Holistic Evaluation of Text-To-Image Models [153.47415461488097]
We introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM)
We identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency.
Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths.
arXiv Detail & Related papers (2023-11-07T19:00:56Z) - Advancing Zero-Shot Digital Human Quality Assessment through
Text-Prompted Evaluation [60.873105678086404]
SJTU-H3D is a subjective quality assessment database specifically designed for full-body digital humans.
It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions.
arXiv Detail & Related papers (2023-07-06T06:55:30Z) - AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI
Generated Images: from the Perspectives of Quality, Authenticity and
Correspondence [42.85549933048976]
We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts.
Based on these images, a subjective experiment is conducted to assess the human visual preferences for each image from three perspectives.
We conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
arXiv Detail & Related papers (2023-07-01T03:30:31Z) - AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment [62.8834581626703]
We build the most comprehensive subjective quality database AGIQA-3K so far.
We conduct a benchmark experiment on this database to evaluate the consistency between the current Image Quality Assessment (IQA) model and human perception.
We believe that the fine-grained subjective scores in AGIQA-3K will inspire subsequent AGI quality models to fit human subjective perception mechanisms.
arXiv Detail & Related papers (2023-06-07T18:28:21Z) - IFQA: Interpretable Face Quality Assessment [23.34924105158927]
This paper proposes a novel face-centric metric based on an adversarial framework where a generator simulates face restoration and a discriminator assesses image quality.
Our metric consistently surpasses existing general or facial image quality assessment metrics by impressive margins.
arXiv Detail & Related papers (2022-11-14T03:04:38Z) - Going the Extra Mile in Face Image Quality Assessment: A Novel Database
and Model [42.05084438912876]
We introduce the largest annotated IQA database developed to date, which contains 20,000 human faces.
We propose a novel deep learning model to accurately predict face image quality, which, for the first time, explores the use of generative priors for IQA.
arXiv Detail & Related papers (2022-07-11T14:28:18Z) - FaceQgen: Semi-Supervised Deep Learning for Face Image Quality
Assessment [19.928262020265965]
FaceQgen is a No-Reference Quality Assessment approach for face images.
It generates a scalar quality measure related with the face recognition accuracy.
It is trained from scratch using the SCface database.
arXiv Detail & Related papers (2022-01-03T17:22:38Z) - FaceQvec: Vector Quality Assessment for Face Biometrics based on ISO
Compliance [15.913755899679733]
FaceQvec is a software component for estimating the conformity of facial images with each of the points contemplated in the ISO/IEC 19794-5.
This quality standard defines general quality guidelines for face images that would make them acceptable or unacceptable for use in official documents such as passports or ID cards.
arXiv Detail & Related papers (2021-11-03T09:07:41Z) - Inducing Predictive Uncertainty Estimation for Face Recognition [102.58180557181643]
We propose a method for generating image quality training data automatically from'mated-pairs' of face images.
We use the generated data to train a lightweight Predictive Confidence Network, termed as PCNet, for estimating the confidence score of a face image.
arXiv Detail & Related papers (2020-09-01T17:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.