Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
Evaluation
- URL: http://arxiv.org/abs/2307.09416v2
- Date: Wed, 19 Jul 2023 08:27:50 GMT
- Title: Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation
Evaluation
- Authors: Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita
Cucchiara, Nicu Sebe
- Abstract summary: We introduce an automated method for Visual Concept Evaluation (ViCE) to assess consistency between a generated/edited image and the corresponding prompt/instructions.
ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment.
- Score: 96.74302670358145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research in Image Generation has recently made significant progress,
particularly boosted by the introduction of Vision-Language models which are
able to produce high-quality visual content based on textual inputs. Despite
ongoing advancements in terms of generation quality and realism, no methodical
frameworks have been defined yet to quantitatively measure the quality of the
generated content and the adherence with the prompted requests: so far, only
human-based evaluations have been adopted for quality satisfaction and for
comparing different generative methods. We introduce a novel automated method
for Visual Concept Evaluation (ViCE), i.e. to assess consistency between a
generated/edited image and the corresponding prompt/instructions, with a
process inspired by the human cognitive behaviour. ViCE combines the strengths
of Large Language Models (LLMs) and Visual Question Answering (VQA) into a
unified pipeline, aiming to replicate the human cognitive process in quality
assessment. This method outlines visual concepts, formulates image-specific
verification questions, utilizes the Q&A system to investigate the image, and
scores the combined outcome. Although this brave new hypothesis of mimicking
humans in the image evaluation process is in its preliminary assessment stage,
results are promising and open the door to a new form of automatic evaluation
which could have significant impact as the image generation or the image target
editing tasks become more and more sophisticated.
Related papers
- KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities [32.03360188710995]
AI-generated and enhanced content must be visually accurate, adhere to intended use, and maintain high visual quality.
One way to monitor and control the visual "quality" of AI-generated and -enhanced content is by deploying Image Quality Assessment (IQA) and Video Quality Assessment (VQA) models.
This paper examines the current shortcomings and possibilities presented by AI-generated and enhanced image and video content.
arXiv Detail & Related papers (2024-10-11T05:08:44Z) - Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis [0.0]
We introduce and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment.
We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness.
Our findings highlight the need for refining computational metrics to better capture human visual perception.
arXiv Detail & Related papers (2024-08-22T23:29:07Z) - GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation [103.3465421081531]
VQAScore is a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt.
Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward.
We release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt.
arXiv Detail & Related papers (2024-06-19T18:00:07Z) - Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment [20.851102845794244]
Distilling high level knowledge about quality bearing attributes is crucial for developing objective Image Quality Assessment (IQA)
We present a new blind IQA (BIQA) model termed Self-supervision and Vision-Language supervision Image QUality Evaluator (SLIQUE)
SLIQUE features a joint vision-language and visual contrastive representation learning framework for acquiring high level knowledge about the images semantic contents, distortion characteristics and appearance properties for IQA.
arXiv Detail & Related papers (2024-06-14T09:18:28Z) - Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+.
This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.