Related papers: Minority-Focused Text-to-Image Generation via Prompt Optimization

Minority-Focused Text-to-Image Generation via Prompt Optimization

URL: http://arxiv.org/abs/2410.07838v2
Date: Mon, 25 Nov 2024 10:30:36 GMT
Title: Minority-Focused Text-to-Image Generation via Prompt Optimization
Authors: Soobin Um, Jong Chul Ye,
Abstract summary: We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. We develop an online prompt optimization framework that can encourage the emergence of desired properties. We then tailor this generic prompt into a specialized solver that promotes the generation of minority features.
Score: 57.319845580050924
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for producing high-quality generations. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that can encourage the emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes the generation of minority features by incorporating a carefully-crafted likelihood objective. Our comprehensive experiments, conducted across various types of T2I models, demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers.

Related papers

Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation [66.66243874361103]
dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. We propose Concept-Aware LoRA, a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts for domain alignment. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain settings.
arXiv Detail & Related papers (2025-03-28T06:23:29Z)
Boost-and-Skip: A Simple Guidance-Free Diffusion for Minority Generation [57.19995625893062]
We present a powerful yet powerful guidance-free approach called Boost-and-Skip for generating minority samples using diffusion models. We highlight that these seemingly-trivial modifications are supported by solid theoretical and empirical evidence. Our experiments demonstrate that Boost-and-Skip greatly enhances the capability of generating minority samples, even rivaling guidance-based state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-10T14:37:26Z)
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing [32.01426831450348]
We introduce FairT2I, a novel framework that harnesses large language models to detect and mitigate social biases in T2I generation. Our results show that FairT2I successfully mitigates social biases and enhances the diversity of sensitive attributes in generated images.
arXiv Detail & Related papers (2025-02-06T07:22:57Z)
PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models [17.12906933388337]
Malicious actors can fine-tune text-to-image (T2I) diffusion models to generate illegal content. We propose a novel prompt selection algorithm based on learning automaton (PromptLA) for efficient and accurate verification.
arXiv Detail & Related papers (2024-12-20T07:24:32Z)
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively. We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability. Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z)
Controllable Generation with Text-to-Image Diffusion Models: A Survey [8.394970202694529]
controllable generation studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models. We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process.
arXiv Detail & Related papers (2024-03-07T07:24:18Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data [20.998032566820907]
This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains.
arXiv Detail & Related papers (2023-06-25T07:40:39Z)
Don't Play Favorites: Minority Guidance for Diffusion Models [59.75996752040651]
We present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels.
arXiv Detail & Related papers (2023-01-29T03:08:47Z)
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators [12.053125079460234]
We show how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors. Our empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism.
arXiv Detail & Related papers (2022-12-21T18:07:39Z)
Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features. It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models. They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space. This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.