A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
- URL: http://arxiv.org/abs/2402.08265v2
- Date: Sun, 12 May 2024 21:02:59 GMT
- Title: A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
- Authors: Shentao Yang, Tianqi Chen, Mingyuan Zhou,
- Abstract summary: We propose a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain.
In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines.
- Score: 54.43177605637759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.
Related papers
- Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening [23.537461698380607]
We propose Motion Prior Distillation (MPD), a simple yet effective inference-time distillation technique.<n>MPD suppresses bidirectional mismatch by distilling the motion residual of the forward path into the backward path.<n>Our method can deliberately avoid denoising the end-conditioned path which causes the ambiguity of the path.
arXiv Detail & Related papers (2026-02-13T07:20:45Z) - DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z) - Diverse Text-to-Image Generation via Contrastive Noise Optimization [60.48914865049489]
Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images.<n>Existing approaches typically optimize intermediate latents or text conditions during inference.<n>We introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective.
arXiv Detail & Related papers (2025-10-04T13:51:32Z) - MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models [86.07486858219137]
Diffusion models excel at generating images conditioned on text prompts.<n>The resulting images often do not satisfy user-specific criteria measured by scalar rewards such as Aesthetic Scores.<n>Recently, inference-time alignment via noise optimization has emerged as an efficient alternative.<n>We show that this approach suffers from reward hacking, where the model produces images that score highly, yet deviate significantly from the original prompt.
arXiv Detail & Related papers (2025-10-02T00:47:36Z) - Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs [36.42060582800515]
We introduce Text Preference Optimization (TPO), a framework that enables "free-lunch" alignment of T2I models.<n>TPO works by training the model to prefer matched prompts over mismatched prompts.<n>Our framework is general and compatible with existing preference-based algorithms.
arXiv Detail & Related papers (2025-09-30T04:32:34Z) - Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment [26.95607772298534]
APA (Adversary Preferences Alignment) is a two-stage framework that decouples conflicting preferences and optimize each with differentiable rewards.<n> APA achieves significantly better attack transferability while maintaining high visual consistency, inspiring further research to approach adversarial attacks from an alignment perspective.
arXiv Detail & Related papers (2025-06-02T10:18:09Z) - Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models [52.877452505561706]
We propose the first copyright evasion attack specifically designed to undermine dataset ownership verification (DOV)<n>Our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation.<n>Our experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
arXiv Detail & Related papers (2025-05-05T17:51:55Z) - GenDR: Lightning Generative Detail Restorator [18.465568249533966]
We present a one-step diffusion model for generative detail restoration, GenDR, distilled from a tailored diffusion model with larger latent space.
Experimental results demonstrate that GenDR achieves state-of-the-art performance in both quantitative metrics and visual fidelity.
arXiv Detail & Related papers (2025-03-09T22:02:18Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.
Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack [5.357486699062561]
We propose a novel and efficient adversarial attack method, Concept Protection by Selective Attention Manipulation (CoPSAM)
For this purpose, we carefully construct an imperceptible noise to be added to clean samples to get their adversarial counterparts.
Experimental validation on a subset of CelebA-HQ face images dataset demonstrates that our approach outperforms existing methods.
arXiv Detail & Related papers (2024-11-25T14:39:18Z) - Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively.
We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability.
Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - Beyond First-Order Tweedie: Solving Inverse Problems using Latent
Diffusion [41.758635460235716]
We introduce Second-order Tweedie sampler from Surrogate Loss (STSL)
STSL offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation.
Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations.
arXiv Detail & Related papers (2023-12-01T14:36:24Z) - Debiasing the Cloze Task in Sequential Recommendation with Bidirectional
Transformers [0.0]
We argue that Inverse Propensity Scoring (IPS) does not extend to sequential recommendation because it fails to account for the temporal nature of the problem.
We then propose a novel propensity scoring mechanism, which can theoretically debias the Cloze task in sequential recommendation.
arXiv Detail & Related papers (2023-01-22T21:44:25Z) - Improving Crowded Object Detection via Copy-Paste [6.941267349187447]
Crowdedness caused by overlapping among similar objects is a ubiquitous challenge in the field of 2D visual object detection.
We first underline two main effects of the crowdedness issue: 1) IoU-confidence correlation disturbances (ICD) and 2) confused de-duplication (CDD)
arXiv Detail & Related papers (2022-11-22T09:25:15Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.