An Online Learning Approach to Prompt-based Selection of Generative Models
- URL: http://arxiv.org/abs/2410.13287v2
- Date: Mon, 10 Feb 2025 04:20:39 GMT
- Title: An Online Learning Approach to Prompt-based Selection of Generative Models
- Authors: Xiaoyan Hu, Ho-fung Leung, Farzan Farnia,
- Abstract summary: An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models.
We propose an online learning framework to predict the best data generation model for a given input prompt.
Our experiments on real and simulated text-to-image and image-to-text generative models show that RFF-UCB performs successfully in identifying the best generation model.
- Score: 23.91197677628145
- License:
- Abstract: Selecting a sample generation scheme from multiple text-based generative models is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed PAK-UCB algorithm addresses a contextual bandit (CB) setting with shared context variables across the arms, utilizing the generated data to update kernel-based functions that predict the score of each model available for unseen text prompts. Additionally, we leverage random Fourier features (RFF) to accelerate the online learning process of PAK-UCB and establish a $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bound for the proposed RFF-based CB algorithm over $T$ iterations. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show that RFF-UCB performs successfully in identifying the best generation model across different sample types.
Related papers
- Be More Diverse than the Most Diverse: Online Selection of Diverse Mixtures of Generative Models [33.04472814852163]
In this work, we explore the selection of a mixture of multiple generative models.
We propose an online learning approach called Mixture Upper Confidence Bound (Mixture-UCB)
arXiv Detail & Related papers (2024-12-23T14:48:17Z) - Conditional Vendi Score: An Information-Theoretic Approach to Diversity Evaluation of Prompt-based Generative Models [15.40817940713399]
We introduce the Conditional-Vendi score based on $H(X|T)$ to quantify the internal diversity of the model.
We conduct several numerical experiments to show the correlation between the Conditional-Vendi score and the internal diversity of text-conditioned generative models.
arXiv Detail & Related papers (2024-11-05T05:30:39Z) - Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe [10.34105218186634]
In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion.
Our innovation is an algorithm that produces optimal configurations of model sizes, data quantities, and fine-tuning methods for text-embedding models at different computational budget levels.
arXiv Detail & Related papers (2024-06-06T15:22:33Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Text-Conditioned Sampling Framework for Text-to-Image Generation with
Masked Generative Models [52.29800567587504]
We propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information.
TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts.
We validate the efficacy of TCTS combined with Frequency Adaptive Sampling (FAS) with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality.
arXiv Detail & Related papers (2023-04-04T03:52:49Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods
in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning"
Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z) - Topical Language Generation using Transformers [4.795530213347874]
This paper presents a novel approach for Topical Language Generation (TLG) by combining a pre-trained LM with topic modeling information.
We extend our model by introducing new parameters and functions to influence the quantity of the topical features presented in the generated text.
Our experimental results demonstrate that our model outperforms the state-of-the-art results on coherency, diversity, and fluency while being faster in decoding.
arXiv Detail & Related papers (2021-03-11T03:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.