Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
Evaluation
- URL: http://arxiv.org/abs/2307.09416v2
- Date: Wed, 19 Jul 2023 08:27:50 GMT
- Title: Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
Evaluation
- Authors: Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita
Cucchiara, Nicu Sebe
- Abstract summary: We introduce an automated method for Visual Concept Evaluation (ViCE) to assess consistency between a generated/edited image and the corresponding prompt/instructions.
ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment.
- Score: 96.74302670358145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research in Image Generation has recently made significant progress,
particularly boosted by the introduction of Vision-Language models which are
able to produce high-quality visual content based on textual inputs. Despite
ongoing advancements in terms of generation quality and realism, no methodical
frameworks have been defined yet to quantitatively measure the quality of the
generated content and the adherence with the prompted requests: so far, only
human-based evaluations have been adopted for quality satisfaction and for
comparing different generative methods. We introduce a novel automated method
for Visual Concept Evaluation (ViCE), i.e. to assess consistency between a
generated/edited image and the corresponding prompt/instructions, with a
process inspired by the human cognitive behaviour. ViCE combines the strengths
of Large Language Models (LLMs) and Visual Question Answering (VQA) into a
unified pipeline, aiming to replicate the human cognitive process in quality
assessment. This method outlines visual concepts, formulates image-specific
verification questions, utilizes the Q&A system to investigate the image, and
scores the combined outcome. Although this brave new hypothesis of mimicking
humans in the image evaluation process is in its preliminary assessment stage,
results are promising and open the door to a new form of automatic evaluation
which could have significant impact as the image generation or the image target
editing tasks become more and more sophisticated.
Related papers
- GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation [103.3465421081531]
VQAScore is a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt.
Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward.
We will release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt.
arXiv Detail & Related papers (2024-06-19T18:00:07Z) - Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment [20.851102845794244]
Distilling high level knowledge about quality bearing attributes is crucial for developing objective Image Quality Assessment (IQA)
We present a new blind IQA (BIQA) model termed Self-supervision and Vision-Language supervision Image QUality Evaluator (SLIQUE)
SLIQUE features a joint vision-language and visual contrastive representation learning framework for acquiring high level knowledge about the images semantic contents, distortion characteristics and appearance properties for IQA.
arXiv Detail & Related papers (2024-06-14T09:18:28Z) - Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+.
This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z) - Bridging the Gap Between Saliency Prediction and Image Quality Assessment [0.0]
Deep neural models have made considerable advances in image quality assessment (IQA)
We conduct an empirical study that reveals the relation between IQA and Saliency Prediction tasks.
We introduce a novel SACID dataset of saliency-aware compressed images and conduct a large-scale comparison of classic and neural-based IQA methods.
arXiv Detail & Related papers (2024-05-08T12:04:43Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Conformer and Blind Noisy Students for Improved Image Quality Assessment [80.57006406834466]
Learning-based approaches for perceptual image quality assessment (IQA) usually require both the distorted and reference image for measuring the perceptual quality accurately.
In this work, we explore the performance of transformer-based full-reference IQA models.
We also propose a method for IQA based on semi-supervised knowledge distillation from full-reference teacher models into blind student models.
arXiv Detail & Related papers (2022-04-27T10:21:08Z) - A survey on IQA [0.0]
This article will review the concepts and metrics of image quality assessment and also video quality assessment.
It briefly introduce some methods of full-reference and semi-reference image quality assessment, and focus on the non-reference image quality assessment methods based on deep learning.
arXiv Detail & Related papers (2021-08-29T10:52:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.