Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
- URL: http://arxiv.org/abs/2401.05675v2
- Date: Mon, 15 Jul 2024 17:19:18 GMT
- Title: Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
- Authors: Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang,
- Abstract summary: We propose Parrot, which addresses the issue of manually adjusting reward weights.
We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network.
We also introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion.
- Score: 40.74782694945025
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.
Related papers
- FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting [18.708185548091716]
FRAP is a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images.
We show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets.
We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment.
arXiv Detail & Related papers (2024-08-21T15:30:35Z) - Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers [58.50071292008407]
We present the first head-to-head comparison of recent discrete optimization techniques for the problem of prompt inversion.
We find that focusing on the CLIP similarity between the inverted prompts and the ground truth image acts as a poor proxy for the similarity between ground truth image and the image generated by the inverted prompts.
arXiv Detail & Related papers (2024-08-12T21:35:59Z) - Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models [85.96013373385057]
Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent.
However, excessive optimization with such reward models, which serve as mere proxy objectives, can compromise the performance of fine-tuned models.
We propose TextNorm, a method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts.
arXiv Detail & Related papers (2024-04-02T11:40:38Z) - Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation [21.983823344984483]
We study the problem of multi-reward reinforcement learning to jointly optimize for multiple text qualities for natural language generation.
We introduce two novel bandit methods, DynaOpt and C-DynaOpt, which rely on the broad strategy of combining rewards into a single value and optimizing them simultaneously.
arXiv Detail & Related papers (2024-03-20T13:24:41Z) - MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization [45.410121761165634]
RL-based techniques can be employed to search for prompts that, when fed into a target language model, maximize a set of user-specified reward functions.
Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards.
arXiv Detail & Related papers (2024-02-18T21:25:09Z) - OT-Attack: Enhancing Adversarial Transferability of Vision-Language
Models via Optimal Transport Optimization [65.57380193070574]
Vision-language pre-training models are vulnerable to multi-modal adversarial examples.
Recent works have indicated that leveraging data augmentation and image-text modal interactions can enhance the transferability of adversarial examples.
We propose an Optimal Transport-based Adversarial Attack, dubbed OT-Attack.
arXiv Detail & Related papers (2023-12-07T16:16:50Z) - HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion
Models [56.112302700630806]
We introduce an innovative algorithm named HiFi Tuner to enhance the appearance preservation of objects during personalized image generation.
Key enhancements include the utilization of mask guidance, a novel parameter regularization technique, and the incorporation of step-wise subject representations.
We extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.
arXiv Detail & Related papers (2023-11-30T02:33:29Z) - Adaptive Image Registration: A Hybrid Approach Integrating Deep Learning
and Optimization Functions for Enhanced Precision [13.242184146186974]
We propose a single framework for image registration based on deep neural networks and optimization.
We show improvements of up to 1.6% in test data, while maintaining the same inference time, and a substantial 1.0% points performance gain in deformation field smoothness.
arXiv Detail & Related papers (2023-11-27T02:48:06Z) - MultiPrompter: Cooperative Prompt Optimization with Multi-Agent
Reinforcement Learning [68.40755873520808]
MultiPrompter is a new framework that views prompt optimization as a cooperative game between prompters.
We show that MultiPrompter effectively reduces the problem size and helps prompters learn optimal prompts.
arXiv Detail & Related papers (2023-10-25T15:58:51Z) - Explainable bilevel optimization: an application to the Helsinki deblur
challenge [1.1470070927586016]
We present a bilevel optimization scheme for the solution of a general image deblurring problem.
A parametric variational-like approach is encapsulated within a machine learning scheme to provide a high quality reconstructed image.
arXiv Detail & Related papers (2022-10-18T11:36:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.