CAP: Evaluation of Persuasive and Creative Image Generation
- URL: http://arxiv.org/abs/2412.10426v1
- Date: Tue, 10 Dec 2024 19:54:59 GMT
- Title: CAP: Evaluation of Persuasive and Creative Image Generation
- Authors: Aysan Aghazadeh, Adriana Kovashka,
- Abstract summary: We introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness in generated advertisement images.<n>Our findings reveal that current Text-to-Image models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages.<n>We introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
- Score: 28.49695567630899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
Related papers
- Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject.
Current methods face challenges in ensuring fidelity to the text prompt.
We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z) - Leveraging Large Models for Evaluating Novel Content: A Case Study on Advertisement Creativity [26.90276644134837]
We attempt to break down visual advertisement creativity into atypicality and originality.
With fine-grained human annotations, we propose a suit of tasks specifically for such a subjective problem.
We also evaluate the alignment between state-of-the-art (SoTA) vision language models (VLM) and humans on our proposed benchmark.
arXiv Detail & Related papers (2025-02-26T04:28:03Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - DiffChat: Learning to Chat with Text-to-Image Synthesis Models for
Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation.
Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - DreamCreature: Crafting Photorealistic Virtual Creatures from
Imagination [140.1641573781066]
We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts.
We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts.
The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
arXiv Detail & Related papers (2023-11-27T01:24:31Z) - TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation [2.6890293832784566]
We propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images.
An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the noise used as a seed for the images.
arXiv Detail & Related papers (2023-07-11T09:23:05Z) - Transferring Visual Attributes from Natural Language to Verified Image
Generation [4.834625048634076]
We propose a Natural Language to Verified Image generation approach (NL2VI) that converts a natural prompt into a visual prompt.
A T2I model then generates an image for the visual prompt, which is then verified with VQA algorithms.
Experiments show that aligning natural prompts with image generation can improve the consistency of the generated images by up to 11% over the state of the art.
arXiv Detail & Related papers (2023-05-24T11:08:26Z) - Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.
While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt.
We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt.
We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z) - DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via
Positive-Negative Prompt-Tuning [85.10894272034135]
Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts.
Recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set.
We present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy.
arXiv Detail & Related papers (2022-11-21T10:37:56Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.