Related papers: LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing

LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing

URL: http://arxiv.org/abs/2502.02743v1
Date: Tue, 04 Feb 2025 22:09:43 GMT
Title: LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing
Authors: Yang Li,
Abstract summary: We present a novel framework that formulates the LLM selection process as a multi-armed bandit problem.<n>Our approach incorporates a preference-conditioned dynamic routing mechanism, allowing users to specify their preferences at inference time.<n>Our method achieves significant improvements in both accuracy and cost-effectiveness across various LLM platforms.
Score: 3.090041654375235
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid advancement in large language models (LLMs) has brought forth a diverse range of models with varying capabilities that excel in different tasks and domains. However, selecting the optimal LLM for user queries often involves a challenging trade-off between accuracy and cost, a problem exacerbated by the diverse demands of individual queries. In this work, we present a novel framework that formulates the LLM selection process as a multi-armed bandit problem, enabling dynamic and intelligent routing of queries to the most appropriate model. Our approach incorporates a preference-conditioned dynamic routing mechanism, allowing users to specify their preferences at inference time, thereby offering a customizable balance between performance and cost. Additionally, our selection policy is designed to generalize to unseen LLMs, ensuring adaptability to new models as they emerge. Experimental results demonstrate that our method achieves significant improvements in both accuracy and cost-effectiveness across various LLM platforms, showcasing the potential of our framework to adaptively optimize LLM selection in real-world scenarios.

Related papers

Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
MixLLM: Dynamic Routing in Mixed Large Language Models [57.309520357563215]
Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. We develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment.
arXiv Detail & Related papers (2025-02-09T02:26:15Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.<n>We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.<n>Existing evaluations tend to rely solely on a final success rate.<n>We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z)
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have seen widespread adoption due to their remarkable performance across various applications.<n>These individual LLMs show limitations in generalization and performance on complex tasks due to inherent training biases, model size constraints, and the quality or diversity of pre-training datasets.<n>We introduce SelectLLM, which efficiently directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z)
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs [20.793892860721712]
We introduce MetaLLM, a framework that dynamically and intelligently routes each query to the optimal large language models (LLMs) By framing the selection problem as a multi-armed bandit, MetaLLM balances prediction accuracy and cost efficiency under uncertainty. Our experiments, conducted on popular LLM platforms such as OpenAI and Together AI, showcase MetaLLM's efficacy in real-world scenarios.
arXiv Detail & Related papers (2024-07-15T15:45:07Z)
SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources. Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Cost-Effective Online Multi-LLM Selection with Versatile Reward Models [30.892090566736652]
We introduce the textitC2MAB-V, an online model for selecting and using large language models (LLMs) textitC2MAB-V is specifically tailored for various collaborative task types with different reward models. We show that textitC2MAB-V effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.
arXiv Detail & Related papers (2024-05-26T14:38:24Z)
OptLLM: Optimal Assignment of Queries to Large Language Models [12.07164196530872]
We propose a framework for addressing the cost-effective query allocation problem for large language models (LLMs) Our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences. To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing.
arXiv Detail & Related papers (2024-05-24T01:05:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.