Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
- URL: http://arxiv.org/abs/2405.02134v1
- Date: Fri, 3 May 2024 14:38:59 GMT
- Title: Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
- Authors: Guillem RamÃrez, Alexandra Birch, Ivan Titov,
- Abstract summary: Decision centers on whether to use a large LLM with better performance or a smaller one with reduced costs.
We propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion.
Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.
- Score: 80.63946798650653
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Researchers and practitioners operating on a limited budget face the cost-performance trade-off dilemma. The challenging decision often centers on whether to use a large LLM with better performance or a smaller one with reduced costs. This has motivated recent research in the optimisation of LLM calls. Either a cascading strategy is used, where a smaller LLM or both are called sequentially, or a routing strategy is used, where only one model is ever called. Both scenarios are dependent on a decision criterion which is typically implemented by an extra neural model. In this work, we propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion. We compare our approach with both cascading and routing strategies using three different pairs of pre-trained small and large LLMs, on nine different tasks and against approaches that require an additional neural model. Our experiments reveal this simple solution optimally balances cost and performance, outperforming existing methods on 25 out of 27 experimental setups.
Related papers
- MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs [21.689490112983677]
We introduce MetaLLM, a framework that dynamically routes each query to the optimal large language models (LLMs) for classification tasks.
By framing the selection problem as a multi-armed bandit, MetaLLM balances prediction accuracy and cost efficiency under uncertainty.
Our experiments, conducted on popular LLM platforms, showcase MetaLLM's efficacy in real-world scenarios.
arXiv Detail & Related papers (2024-07-15T15:45:07Z) - Efficient Sequential Decision Making with Large Language Models [19.083642464977224]
This paper focuses on extending the success of large language models (LLMs) to sequential decision making.
Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs.
We propose a new approach that leverages online model selection algorithms to efficiently incorporate LLMs agents into sequential decision making.
arXiv Detail & Related papers (2024-06-17T22:13:22Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - Value Augmented Sampling for Language Model Alignment and Personalization [39.070662999014836]
We present a new framework for reward optimization, Value Augmented Sampling (VAS)
VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function.
Our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time.
arXiv Detail & Related papers (2024-05-10T17:59:04Z) - Which LLM to Play? Convergence-Aware Online Model Selection with
Time-Increasing Bandits [43.65904435249823]
We propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances.
Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection.
arXiv Detail & Related papers (2024-03-11T23:52:46Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - Leaving the Nest: Going Beyond Local Loss Functions for
Predict-Then-Optimize [57.22851616806617]
We show that our method achieves state-of-the-art results in four domains from the literature.
Our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.
arXiv Detail & Related papers (2023-05-26T11:17:45Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.