A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
- URL: http://arxiv.org/abs/2402.08265v2
- Date: Sun, 12 May 2024 21:02:59 GMT
- Title: A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
- Authors: Shentao Yang, Tianqi Chen, Mingyuan Zhou,
- Abstract summary: We propose a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain.
In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines.
- Score: 54.43177605637759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention. While prior works exist on directly optimizing T2I by preference data, these methods are developed under the bandit assumption of a latent reward on the entire diffusion reverse chain, while ignoring the sequential nature of the generation process. This may harm the efficacy and efficiency of preference alignment. In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. In particular, we introduce temporal discounting into DPO-style explicit-reward-free objectives, to break the temporal symmetry therein and suit the T2I generation hierarchy. In experiments on single and multiple prompt generation, our method is competitive with strong relevant baselines, both quantitatively and qualitatively. Further investigations are conducted to illustrate the insight of our approach.
Related papers
- Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively.
We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability.
Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - Beyond First-Order Tweedie: Solving Inverse Problems using Latent
Diffusion [41.758635460235716]
We introduce Second-order Tweedie sampler from Surrogate Loss (STSL)
STSL offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation.
Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations.
arXiv Detail & Related papers (2023-12-01T14:36:24Z) - Debiasing the Cloze Task in Sequential Recommendation with Bidirectional
Transformers [0.0]
We argue that Inverse Propensity Scoring (IPS) does not extend to sequential recommendation because it fails to account for the temporal nature of the problem.
We then propose a novel propensity scoring mechanism, which can theoretically debias the Cloze task in sequential recommendation.
arXiv Detail & Related papers (2023-01-22T21:44:25Z) - Improving Crowded Object Detection via Copy-Paste [6.941267349187447]
Crowdedness caused by overlapping among similar objects is a ubiquitous challenge in the field of 2D visual object detection.
We first underline two main effects of the crowdedness issue: 1) IoU-confidence correlation disturbances (ICD) and 2) confused de-duplication (CDD)
arXiv Detail & Related papers (2022-11-22T09:25:15Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.