Related papers: CAP: Evaluation of Persuasive and Creative Image Generation

CAP: Evaluation of Persuasive and Creative Image Generation

URL: http://arxiv.org/abs/2412.10426v1
Date: Tue, 10 Dec 2024 19:54:59 GMT
Title: CAP: Evaluation of Persuasive and Creative Image Generation
Authors: Aysan Aghazadeh, Adriana Kovashka,
Abstract summary: We introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness in generated advertisement images.<n>Our findings reveal that current Text-to-Image models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages.<n>We introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
Score: 28.49695567630899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.

Related papers

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z)
IA-T2I: Internet-Augmented Text-to-Image Generation [13.765327654914199]
Current text-to-image (T2I) generation models achieve promising results, but they fail on the scenarios where the knowledge implied in the text prompt is uncertain.<n>We propose an Internet-Augmented text-to-image generation (IA-T2I) framework to compel T2I models clear about such uncertain knowledge by providing them with reference images.
arXiv Detail & Related papers (2025-05-21T17:31:49Z)
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject. Current methods face challenges in ensuring fidelity to the text prompt. We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z)
Leveraging Large Models for Evaluating Novel Content: A Case Study on Advertisement Creativity [26.90276644134837]
We attempt to break down visual advertisement creativity into atypicality and originality. With fine-grained human annotations, we propose a suit of tasks specifically for such a subjective problem. We also evaluate the alignment between state-of-the-art (SoTA) vision language models (VLM) and humans on our proposed benchmark.
arXiv Detail & Related papers (2025-02-26T04:28:03Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation [40.478839423995296]
We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt.
arXiv Detail & Related papers (2024-03-08T02:24:27Z)
Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z)
DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination [140.1641573781066]
We introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts, we aim to train a T2I model capable of creating new, hybrid concepts. We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts. The T2I thus adapts to generate novel concepts with faithful structures and photorealistic appearance.
arXiv Detail & Related papers (2023-11-27T01:24:31Z)
TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation [2.6890293832784566]
We propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the noise used as a seed for the images.
arXiv Detail & Related papers (2023-07-11T09:23:05Z)
Transferring Visual Attributes from Natural Language to Verified Image Generation [4.834625048634076]
We propose a Natural Language to Verified Image generation approach (NL2VI) that converts a natural prompt into a visual prompt. A T2I model then generates an image for the visual prompt, which is then verified with VQA algorithms. Experiments show that aligning natural prompts with image generation can improve the consistency of the generated images by up to 11% over the state of the art.
arXiv Detail & Related papers (2023-05-24T11:08:26Z)
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models [103.61066310897928]
Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. We introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness
arXiv Detail & Related papers (2023-01-31T18:10:38Z)
DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning [85.10894272034135]
Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts. Recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set. We present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy.
arXiv Detail & Related papers (2022-11-21T10:37:56Z)
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts. These models have been trained using text data collected from content-based labelling protocols. We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.