Reinforcement Learning from Diffusion Feedback: Q* for Image Search
- URL: http://arxiv.org/abs/2311.15648v1
- Date: Mon, 27 Nov 2023 09:20:12 GMT
- Title: Reinforcement Learning from Diffusion Feedback: Q* for Image Search
- Authors: Aboli Marathe
- Abstract summary: We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
- Score: 2.5835347022640254
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large vision-language models are steadily gaining personalization
capabilities at the cost of fine-tuning or data augmentation. We present two
models for image generation using model-agnostic learning that align semantic
priors with generative capabilities. RLDF, or Reinforcement Learning from
Diffusion Feedback, is a singular approach for visual imitation through
prior-preserving reward function guidance. This employs Q-learning (with
standard Q*) for generation and follows a semantic-rewarded trajectory for
image search through finite encoding-tailored actions. The second proposed
method, noisy diffusion gradient, is optimization driven. At the root of both
methods is a special CFG encoding that we propose for continual semantic
guidance. Using only a single input image and no text input, RLDF generates
high-quality images over varied domains including retail, sports and
agriculture showcasing class-consistency and strong visual diversity. Project
website is available at https://infernolia.github.io/RLDF.
Related papers
- Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences [0.0]
We introduce Diff-Instruct++ (DI++), the first, fast-converging and image data-free human preference alignment method for one-step text-to-image generators.
In the experiment sections, we align both UNet-based and DiT-based one-step generators using DI++, which use the Stable Diffusion 1.5 and the PixelArt-$alpha$ as the reference diffusion processes.
The resulting DiT-based one-step text-to-image model achieves a strong Aesthetic Score of 6.19 and an Image Reward of 1.24 on the validation prompt dataset
arXiv Detail & Related papers (2024-10-24T16:17:18Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z) - Align before Fuse: Vision and Language Representation Learning with
Momentum Distillation [52.40490994871753]
We introduce a contrastive loss to representations BEfore Fusing (ALBEF) through cross-modal attention.
We propose momentum distillation, a self-training method which learns from pseudo-targets produced by a momentum model.
ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks.
arXiv Detail & Related papers (2021-07-16T00:19:22Z) - An Effective Automatic Image Annotation Model Via Attention Model and
Data Equilibrium [0.0]
The proposed model has three phases, including a feature extractor, a tag generator, and an image annotator.
The experiments conducted on two benchmark datasets confirm that the superiority of the proposed model compared to the previous models.
arXiv Detail & Related papers (2020-01-26T05:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.