Related papers: Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

URL: http://arxiv.org/abs/2405.16587v2
Date: Wed, 02 Oct 2024 13:22:27 GMT
Title: Cost-Effective Online Multi-LLM Selection with Versatile Reward Models
Authors: Xiangxiang Dai, Jin Li, Xutong Liu, Anqi Yu, John C. S. Lui,
Abstract summary: We introduce the textitC2MAB-V, an online model for selecting and using large language models (LLMs) textitC2MAB-V is specifically tailored for various collaborative task types with different reward models. We show that textitC2MAB-V effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.
Score: 30.892090566736652
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underline{V}ersatile reward models for optimal LLM selection and usage. This online model differs from traditional static approaches or those reliant on a single LLM without cost consideration. With multiple LLMs deployed on a scheduling cloud and a local server dedicated to handling user queries, \textit{C2MAB-V} facilitates the selection of multiple LLMs over a combinatorial search space, specifically tailored for various collaborative task types with different reward models. Based on our designed online feedback mechanism and confidence bound technique, \textit{C2MAB-V} can effectively address the multi-LLM selection challenge by managing the exploration-exploitation trade-off across different models, while also balancing cost and reward for diverse tasks. The NP-hard integer linear programming problem for selecting multiple LLMs with trade-off dilemmas is addressed by: i) decomposing the integer problem into a relaxed form by the local server, ii) utilizing a discretization rounding scheme that provides optimal LLM combinations by the scheduling cloud, and iii) continual online updates based on feedback. Theoretically, we prove that \textit{C2MAB-V} offers strict guarantees over versatile reward models, matching state-of-the-art results for regret and violations in some degenerate cases. Empirically, we show that \textit{C2MAB-V} effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.

Related papers

A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case [59.58213261128626]
We propose a blockchain-enabled collaborative framework that connects multiple Large Language Models (LLMs) into a Trustworthy Multi-LLM Network (MultiLLMN)<n>This architecture enables the cooperative evaluation and selection of the most reliable and high-quality responses to complex network optimization problems.
arXiv Detail & Related papers (2025-05-06T05:32:46Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings [25.184186431458862]
Large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources. These unique characteristics of LLMs, together with their large model size, make their deployment more challenging. We design TMO, a local-cloud LLM inference system with Three-M Offloading: Multi-modal, Multi-task, and Multi-dialogue.
arXiv Detail & Related papers (2025-02-16T06:18:28Z)
MixLLM: Dynamic Routing in Mixed Large Language Models [57.309520357563215]
Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. We develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment.
arXiv Detail & Related papers (2025-02-09T02:26:15Z)
LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing [3.090041654375235]
We present a novel framework that formulates the LLM selection process as a multi-armed bandit problem. Our approach incorporates a preference-conditioned dynamic routing mechanism, allowing users to specify their preferences at inference time. Our method achieves significant improvements in both accuracy and cost-effectiveness across various LLM platforms.
arXiv Detail & Related papers (2025-02-04T22:09:43Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models. We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
LLM Chain Ensembles for Scalable and Accurate Data Annotation [1.7388851660609117]
Large language models (LLMs) can perform zero-shot classification, but large-scale deployment can be expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings.
arXiv Detail & Related papers (2024-10-16T20:03:51Z)
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. We introduce SelectLLM, a novel algorithm that directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z)
SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources. Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z)
Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges [5.934258790280767]
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these challenges.
arXiv Detail & Related papers (2024-06-26T07:12:06Z)
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z)
Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection [80.63946798650653]
Decision centers on whether to use a large LLM with better performance or a smaller one with reduced costs. We propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion. Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.
arXiv Detail & Related papers (2024-05-03T14:38:59Z)
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs [3.450141240227484]
We propose a lightweight method for any-precision quantization of Large Language Models (LLMs) Our solution significantly reduces the high costs of deploying multiple, different-sized LLMs. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput.
arXiv Detail & Related papers (2024-02-16T09:06:06Z)
Lightweight In-Context Tuning for Multimodal Unified Models [57.10831399642176]
MultiModal In-conteXt Tuning (M$2$IXT) is a lightweight module to enhance the ICL capabilities of multimodal unified models. When tuned on as little as 50K multimodal data, M$2$IXT can boost the few-shot ICL performance significantly.
arXiv Detail & Related papers (2023-10-08T10:47:24Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.