Related papers: Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

URL: http://arxiv.org/abs/2307.09416v2
Date: Wed, 19 Jul 2023 08:27:50 GMT
Title: Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation
Authors: Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe
Abstract summary: We introduce an automated method for Visual Concept Evaluation (ViCE) to assess consistency between a generated/edited image and the corresponding prompt/instructions. ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment.
Score: 96.74302670358145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Research in Image Generation has recently made significant progress, particularly boosted by the introduction of Vision-Language models which are able to produce high-quality visual content based on textual inputs. Despite ongoing advancements in terms of generation quality and realism, no methodical frameworks have been defined yet to quantitatively measure the quality of the generated content and the adherence with the prompted requests: so far, only human-based evaluations have been adopted for quality satisfaction and for comparing different generative methods. We introduce a novel automated method for Visual Concept Evaluation (ViCE), i.e. to assess consistency between a generated/edited image and the corresponding prompt/instructions, with a process inspired by the human cognitive behaviour. ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment. This method outlines visual concepts, formulates image-specific verification questions, utilizes the Q&A system to investigate the image, and scores the combined outcome. Although this brave new hypothesis of mimicking humans in the image evaluation process is in its preliminary assessment stage, results are promising and open the door to a new form of automatic evaluation which could have significant impact as the image generation or the image target editing tasks become more and more sophisticated.

Related papers

Exploring Image Quality Assessment from a New Perspective: Pupil Size [58.577929564744146]
This paper explores how the image quality assessment (IQA) task affects the cognitive processes of people from the perspective of pupil size.<n>By analyzing the difference in pupil size between the two tasks, we find that people may activate the visual attention mechanism when evaluating image quality.
arXiv Detail & Related papers (2025-05-20T02:27:34Z)
A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks [1.8563642867160601]
The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes. This study focuses on the relatively underexplored concept of image regeneration using AI. We present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets.
arXiv Detail & Related papers (2025-04-29T01:21:16Z)
Scene Perceived Image Perceptual Score (SPIPS): combining global and local perception for image quality assessment [0.0]
We propose a novel IQA approach that bridges the gap between deep learning methods and human perception. Our model disentangles deep features into high-level semantic information and low-level perceptual details, treating each stream separately. This hybrid design enables the model to assess both global context and intricate image details, better reflecting the human visual process.
arXiv Detail & Related papers (2025-04-24T04:06:07Z)
Embodied Image Quality Assessment for Robotic Intelligence [36.80460733311791]
We first propose an embodied image quality assessment (EIQA) frameworks. We establish assessment metrics for input images based on the downstream tasks of robot. Experiments demonstrate that quality assessment of embodied images is different from that of humans.
arXiv Detail & Related papers (2024-12-25T04:29:22Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities [32.03360188710995]
AI-generated and enhanced content must be visually accurate, adhere to intended use, and maintain high visual quality. One way to monitor and control the visual "quality" of AI-generated and -enhanced content is by deploying Image Quality Assessment (IQA) and Video Quality Assessment (VQA) models. This paper examines the current shortcomings and possibilities presented by AI-generated and enhanced image and video content.
arXiv Detail & Related papers (2024-10-11T05:08:44Z)
Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis [0.0]
We introduce and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment. We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness. Our findings highlight the need for refining computational metrics to better capture human visual perception.
arXiv Detail & Related papers (2024-08-22T23:29:07Z)
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation [103.3465421081531]
VQAScore is a metric measuring the likelihood that a VQA model views an image as accurately depicting the prompt. Ranking by VQAScore is 2x to 3x more effective than other scoring methods like PickScore, HPSv2, and ImageReward. We release a new GenAI-Rank benchmark with over 40,000 human ratings to evaluate scoring metrics on ranking images generated from the same prompt.
arXiv Detail & Related papers (2024-06-19T18:00:07Z)
Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment [20.851102845794244]
Distilling high level knowledge about quality bearing attributes is crucial for developing objective Image Quality Assessment (IQA) We present a new blind IQA (BIQA) model termed Self-supervision and Vision-Language supervision Image QUality Evaluator (SLIQUE) SLIQUE features a joint vision-language and visual contrastive representation learning framework for acquiring high level knowledge about the images semantic contents, distortion characteristics and appearance properties for IQA.
arXiv Detail & Related papers (2024-06-14T09:18:28Z)
Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+. This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.