PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
- URL: http://arxiv.org/abs/2505.18901v1
- Date: Sat, 24 May 2025 23:26:33 GMT
- Title: PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
- Authors: Xiaoyan Hu, Lauren Pick, Ho-fung Leung, Farzan Farnia,
- Abstract summary: We introduce PromptWise, an online learning framework designed to assign a sequence of prompts to a group of large language models.<n> PromptWise strategically queries cheaper models first, progressing to more expensive options only if the lower-cost models fail to adequately address a given prompt.<n>The results highlight that PromptWise consistently outperforms cost-unaware baseline methods.
- Score: 22.732551029493987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of generative AI models has provided users with numerous options to address their prompts. When selecting a generative AI model for a given prompt, users should consider not only the performance of the chosen model but also its associated service cost. The principle guiding such consideration is to select the least expensive model among the available satisfactory options. However, existing model-selection approaches typically prioritize performance, overlooking pricing differences between models. In this paper, we introduce PromptWise, an online learning framework designed to assign a sequence of prompts to a group of large language models (LLMs) in a cost-effective manner. PromptWise strategically queries cheaper models first, progressing to more expensive options only if the lower-cost models fail to adequately address a given prompt. Through numerical experiments, we demonstrate PromptWise's effectiveness across various tasks, including puzzles of varying complexity and code generation/translation tasks. The results highlight that PromptWise consistently outperforms cost-unaware baseline methods, emphasizing that directly assigning prompts to the most expensive models can lead to higher costs and potentially lower average performance.
Related papers
- Reasoning Models are Test Exploiters: Rethinking Multiple-Choice [10.085788712670487]
Large Language Models (LLMs) are asked to choose among a fixed set of choices in question-answering domains.<n>Multiple-choice question-answering (McQCA) is a good proxy for the downstream performance of models as long as they were allowed to perform chain-of-thought reasoning.<n>We conclude that MCQA is no longer a good proxy for assessing downstream performance of state-of-the-art models.
arXiv Detail & Related papers (2025-07-21T07:49:32Z) - ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities [64.24517317344959]
High-quality prompts are crucial for eliciting outstanding performance from large language models on complex tasks.<n>We propose ORPP, a framework that enhances model performance by optimizing and generating role-playing prompts.<n>We show that ORPP not only matches but in most cases surpasses existing mainstream prompt optimization methods in terms of performance.
arXiv Detail & Related papers (2025-06-03T05:51:35Z) - Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation [55.42794740244581]
We propose a novel prompt optimization framework, designed to rephrase a simple user prompt into a sophisticated prompt to a text-to-image model.<n> Specifically, we employ the large vision language models (LVLMs) as the solver to rewrite the user prompt, and concurrently, employ LVLMs as a reward model to score the aesthetics and alignment of the images generated by the optimized prompt.<n>Instead of laborious human feedback, we exploit the prior knowledge of the LVLM to provide rewards, i.e., AI feedback.
arXiv Detail & Related papers (2025-05-22T15:05:07Z) - LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead [19.573553157421774]
Light is a novel framework designed to systematically select and integrate a small subset of LLMs from a larger pool.<n>Experiments demonstrate that Light matches or outperforms widely-used ensemble baselines, achieving up to a 25% improvement in accuracy.<n>This work introduces a practical approach for efficient LLM selection and provides valuable insights into optimal strategies for model combination.
arXiv Detail & Related papers (2025-05-22T04:46:04Z) - EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z) - Exploring Task-Level Optimal Prompts for Visual In-Context Learning [20.34945396590862]
We propose task-level prompting to reduce the cost of searching for prompts during the inference stage.<n>We show that our proposed method can identify near-optimal prompts and reach the best VICL performance with a minimal cost.
arXiv Detail & Related papers (2025-01-15T14:52:20Z) - Pricing and Competition for Generative AI [3.8677478583601776]
We explore the problem of how developers of new generative AI software can release and price their technology.
We first develop a comparison of two different models for a specific task with respect to user cost-effectiveness.
We then model the pricing problem of generative AI software as a game between two different companies.
arXiv Detail & Related papers (2024-11-04T22:52:45Z) - Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection [40.85209520973634]
An ideal model selection scheme should support two operations efficiently over a large pool of candidate models.
Previous solutions to model selection require high computational complexity for at least one of these two operations.
We present Standardized Embedder, an empirical realization of isolated model embedding.
arXiv Detail & Related papers (2024-06-11T17:57:49Z) - Efficient Prompt Optimization Through the Lens of Best Arm Identification [50.56113809171805]
This work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint.
It is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB)
arXiv Detail & Related papers (2024-02-15T05:31:13Z) - Modeling Choice via Self-Attention [8.394221523847325]
We show that our attention-based choice model is a low-optimal generalization of the Halo Multinomial Logit (Halo-MNL) model.
We also establish the first realistic-scale benchmark for choice estimation on real data, conducting an evaluation of existing models.
arXiv Detail & Related papers (2023-11-11T11:13:07Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.