Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation
- URL: http://arxiv.org/abs/2308.07929v2
- Date: Thu, 21 Sep 2023 14:53:31 GMT
- Title: Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation
- Authors: Victor Gallego
- Abstract summary: We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model.
Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, large multimodal models, such as CLIP and Stable Diffusion have
experimented tremendous successes in both foundations and applications.
However, as these models increase in parameter size and computational
requirements, it becomes more challenging for users to personalize them for
specific tasks or preferences. In this work, we address the problem of adapting
the previous models towards sets of particular human preferences, aligning the
retrieved or generated images with the preferences of the user. We leverage the
Bradley-Terry preference model to develop a fast adaptation method that
efficiently fine-tunes the original model, with few examples and with minimal
computing resources. Extensive evidence of the capabilities of this framework
is provided through experiments in different domains related to multimodal text
and image understanding, including preference prediction as a reward model, and
generation tasks.
Related papers
- JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences [6.398937923320069]
We propose PAL, a framework to model human preference complementary to existing pretraining strategies.
We show that PAL achieves competitive reward model accuracy compared to strong baselines.
arXiv Detail & Related papers (2024-06-12T17:54:54Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple and effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - Orthogonal Adaptation for Modular Customization of Diffusion Models [42.51086622161094]
We address a new problem called Modular Customization, with the goal of efficiently merging customized models.
We introduce Orthogonal Adaptation, a method designed to encourage the customized models, which do not have access to each other during fine-tuning.
Our proposed method is both simple and versatile, applicable to nearly all optimizable weights in the model architecture.
arXiv Detail & Related papers (2023-12-05T02:17:48Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Coreference Resolution without Span Representations [20.84150608402576]
We introduce a lightweight coreference model that removes the dependency on span representations, handcrafted features, and NLPs.
Our model performs competitively with the current end-to-end model, while being simpler and more efficient.
arXiv Detail & Related papers (2021-01-02T11:46:51Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.