Related papers: Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

URL: http://arxiv.org/abs/2403.19103v1
Date: Thu, 28 Mar 2024 02:35:53 GMT
Title: Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Authors: Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter,
Abstract summary: PRISM is an algorithm that automatically identifies human-interpretable and transferable prompts. It can effectively generate desired concepts given only black-box access to T2I models. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images.
Score: 150.57983348059528
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images across multiple T2I models, including Stable Diffusion, DALL-E, and Midjourney.

Related papers

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt [101.17660804110409]
Text-to-image generation models can create high-quality images from input prompts. They struggle to support the consistent generation of identity-preserving requirements for storytelling. We propose a novel training-free method for consistent text-to-image generation.
arXiv Detail & Related papers (2025-01-23T10:57:22Z)
RT-Attack: Jailbreaking Text-to-Image Models via Random Token [24.61198605177661]
We introduce a two-stage query-based black-box attack method utilizing random search. In the first stage, we establish a preliminary prompt by maximizing the semantic similarity between the adversarial and target harmful prompts. In the second stage, we use this initial prompt to refine our approach, creating a detailed adversarial prompt aimed at jailbreaking.
arXiv Detail & Related papers (2024-08-25T17:33:40Z)
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models [59.16287352266203]
We introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method for text-to-image (T2I) diffusion models. APTP learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores.
arXiv Detail & Related papers (2024-06-17T19:22:04Z)
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models [71.49054220807983]
A prevalent limitation persists in the effective communication with T2I models, such as Stable Diffusion, using natural language descriptions. Inspired by the recently released DALLE3, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I) We present a simple approach that augments LLMs for iT2I with prompting techniques and off-the-shelf T2I models.
arXiv Detail & Related papers (2023-10-11T16:53:40Z)
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [73.74570290836152]
BLIP-Diffusion is a new subject-driven image generation model that supports multimodal control. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation.
arXiv Detail & Related papers (2023-05-24T04:51:04Z)
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation [37.25815760042241]
This paper introduces a new framework, dubbed DirecT2V, to generate text-to-video (T2V) videos. We equip a diffusion model with a novel value mapping method and dual-softmax filtering, which do not require any additional training. The experimental results validate the effectiveness of our framework in producing visually coherent and storyful videos.
arXiv Detail & Related papers (2023-05-23T17:57:09Z)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models. Our method leverages a pretrained large language model for grounded generation in a novel two-stage process. Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z)
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt. We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts. We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.