Related papers: Fast Prompt Alignment for Text-to-Image Generation

Fast Prompt Alignment for Text-to-Image Generation

URL: http://arxiv.org/abs/2412.08639v1
Date: Wed, 11 Dec 2024 18:58:41 GMT
Title: Fast Prompt Alignment for Text-to-Image Generation
Authors: Khalil Mrini, Hanlin Lu, Linjie Yang, Weilin Huang, Heng Wang,
Abstract summary: This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach.<n>FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts.<n>FPA achieves competitive text-image alignment scores at a fraction of the processing time.
Score: 28.66112701912297
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details. This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach, enhancing text-to-image alignment efficiency without the iterative overhead typical of current methods like OPT2I. FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts to enable real-time inference, reducing computational demands while preserving alignment fidelity. Extensive evaluations on the COCO Captions and PartiPrompts datasets demonstrate that FPA achieves competitive text-image alignment scores at a fraction of the processing time, as validated through both automated metrics (TIFA, VQA) and human evaluation. A human study with expert annotators further reveals a strong correlation between human alignment judgments and automated scores, underscoring the robustness of FPA's improvements. The proposed method showcases a scalable, efficient alternative to iterative prompt optimization, enabling broader applicability in real-time, high-demand settings. The codebase is provided to facilitate further research: https://github.com/tiktok/fast_prompt_alignment

Related papers

IPGO: Indirect Prompt Gradient Optimization on Text-to-Image Generative Models with High Data Efficiency [16.559232159385193]
We introduce a novel framework, Indirect Prompt Gradient Optimization (IPGO), for prompt-level fine-tuning. IPGO enhances prompt embeddings by injecting continuously differentiable tokens at the beginning and end of the prompt embeddings. It allows for gradient-based optimization of injected tokens while enforcing value, orthonormality, and conformity constraints.
arXiv Detail & Related papers (2025-03-25T18:14:42Z)
Robustness-aware Automatic Prompt Optimization [45.43458098928881]
We propose BATprompt, a novel method for prompt generation designed to withstand input perturbations. Inspired by adversarial training techniques, BATprompt demonstrates strong performance on a variety of perturbed tasks. We evaluate BATprompt on multiple datasets across both language understanding and generation tasks.
arXiv Detail & Related papers (2024-12-24T06:05:08Z)
TIPO: Text to Image with Text Presampling for Prompt Optimization [16.001151202788304]
TIPO is an innovative framework designed to enhance text-to-image (T2I) generation by language model (LM) Unlike previous approaches that rely on Large Language Models (LLMs) or reinforcement learning (RL), TIPO adjusts user input prompts with the distribution of a trained prompt dataset.
arXiv Detail & Related papers (2024-11-12T19:09:45Z)
SCULPT: Systematic Tuning of Long Prompts [17.00433893207345]
We propose a framework that treats prompt optimization as a hierarchical tree refinement problem. SCULPT represents prompts as tree structures, enabling targeted modifications while preserving contextual integrity. It produces more stable and interpretable prompt modifications, ensuring better generalization across tasks.
arXiv Detail & Related papers (2024-10-28T07:10:10Z)
IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO) IPO utilizes large language models (LLMs) to generate textual prompts dynamically. We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting [18.708185548091716]
FRAP is a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets. We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment.
arXiv Detail & Related papers (2024-08-21T15:30:35Z)
MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning [68.40755873520808]
MultiPrompter is a new framework that views prompt optimization as a cooperative game between prompters. We show that MultiPrompter effectively reduces the problem size and helps prompters learn optimal prompts.
arXiv Detail & Related papers (2023-10-25T15:58:51Z)
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker [0.2580765958706853]
Large-scale language model (LLM) is utilized as a zero-shot re-ranker with excellent results. LLM is highly dependent on the prompts, the impact and the optimization of the prompts for the zero-shot re-ranker are not explored yet. We propose a novel discrete prompt optimization method, Constrained Prompt generation (Co-Prompt), with the metric estimating the optimum for re-ranking.
arXiv Detail & Related papers (2023-05-23T06:35:33Z)
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery [55.905769757007185]
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
arXiv Detail & Related papers (2023-02-07T18:40:18Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)
CPL: Counterfactual Prompt Learning for Vision and Language Models [76.18024920393245]
This paper presents a novel underlinetextbfCounterfactual underlinetextbfPrompt underlinetextbfLearning (CPL) method for vision and language models. CPL simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks.
arXiv Detail & Related papers (2022-10-19T08:06:39Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding [78.71529237748018]
Grounding temporal video segments described in natural language queries effectively and efficiently is a crucial capability needed in vision-and-language fields. Most existing approaches adopt elaborately designed cross-modal interaction modules to improve the grounding performance. We propose a commonsense-aware cross-modal alignment framework, which incorporates commonsense-guided visual and text representations into a complementary common space.
arXiv Detail & Related papers (2022-04-04T13:07:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.