Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation
- URL: http://arxiv.org/abs/2308.07929v2
- Date: Thu, 21 Sep 2023 14:53:31 GMT
- Title: Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation
- Authors: Victor Gallego
- Abstract summary: We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model.
Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, large multimodal models, such as CLIP and Stable Diffusion have
experimented tremendous successes in both foundations and applications.
However, as these models increase in parameter size and computational
requirements, it becomes more challenging for users to personalize them for
specific tasks or preferences. In this work, we address the problem of adapting
the previous models towards sets of particular human preferences, aligning the
retrieved or generated images with the preferences of the user. We leverage the
Bradley-Terry preference model to develop a fast adaptation method that
efficiently fine-tunes the original model, with few examples and with minimal
computing resources. Extensive evidence of the capabilities of this framework
is provided through experiments in different domains related to multimodal text
and image understanding, including preference prediction as a reward model, and
generation tasks.
Related papers
- Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - You Only Submit One Image to Find the Most Suitable Generative Model [48.67303250592189]
We propose a novel setting called Generative Model Identification (GMI)
GMI aims to enable the user to identify the most appropriate generative model(s) for the user's requirements efficiently.
arXiv Detail & Related papers (2024-12-16T14:46:57Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer [40.32254040909614]
We propose ACE, an All-round Creator and Editor, for visual generation tasks.
We first introduce a unified condition format termed Long-context Condition Unit (LCU)
We then propose a novel Transformer-based diffusion model that uses LCU as input, aiming for joint training across various generation and editing tasks.
arXiv Detail & Related papers (2024-09-30T17:56:27Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.