TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection
- URL: http://arxiv.org/abs/2404.13082v1
- Date: Wed, 17 Apr 2024 05:56:49 GMT
- Title: TREACLE: Thrifty Reasoning via Context-Aware LLM and Prompt Selection
- Authors: Xuechen Zhang, Zijian Huang, Ege Onur Taga, Carlee Joe-Wong, Samet Oymak, Jiasi Chen,
- Abstract summary: We propose TREACLE, a reinforcement learning policy that jointly selects the model and prompting scheme while respecting the user's monetary cost and latency constraints.
Our evaluations show that TREACLE enables cost savings of up to 85% compared to baselines while maintaining high accuracy.
- Score: 31.972053219549757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent successes in natural language processing have led to the proliferation of large language models (LLMs) by multiple providers. Each LLM offering has different inference accuracy, monetary cost, and latency, and their accuracy further depends on the exact wording of the question (i.e., the specific prompt). At the same time, users often have a limit on monetary budget and latency to answer all their questions, and they do not know which LLMs to choose for each question to meet their accuracy and long-term budget requirements. To navigate this rich design space, we propose TREACLE (Thrifty Reasoning via Context-Aware LLM and Prompt Selection), a reinforcement learning policy that jointly selects the model and prompting scheme while respecting the user's monetary cost and latency constraints. TREACLE uses the problem context, including question text embeddings (reflecting the type or difficulty of a query) and the response history (reflecting the consistency of previous responses) to make smart decisions. Our evaluations on standard reasoning datasets (GSM8K, CSQA, and LLC ) with various LLMs and prompts show that TREACLE enables cost savings of up to 85% compared to baselines while maintaining high accuracy. Importantly, it provides the user with the ability to gracefully trade off accuracy for cost.
Related papers
- Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - CLR-Bench: Evaluating Large Language Models in College-level Reasoning [17.081788240112417]
Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks.
We present CLR-Bench to comprehensively evaluate the LLMs in complex college-level reasoning.
arXiv Detail & Related papers (2024-10-23T04:55:08Z) - Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established.
This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt.
We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z) - Cost-efficient Knowledge-based Question Answering with Large Language Models [28.816821631082856]
Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge.
Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training.
We propose Coke, a novel cost-efficient strategy for KBQA with LLMs, modeled as a tailored multi-armed bandit problem.
arXiv Detail & Related papers (2024-05-27T16:37:34Z) - OptLLM: Optimal Assignment of Queries to Large Language Models [12.07164196530872]
We propose a framework for addressing the cost-effective query allocation problem for large language models (LLMs)
Our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences.
To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing.
arXiv Detail & Related papers (2024-05-24T01:05:37Z) - CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models [60.59638232596912]
We introduce CLAMBER, a benchmark for evaluating large language models (LLMs)
Building upon the taxonomy, we construct 12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs.
Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries.
arXiv Detail & Related papers (2024-05-20T14:34:01Z) - Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Question Answering as Programming for Solving Time-Sensitive Questions [84.07553016489769]
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world.
Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering.
This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics.
We propose a novel approach where we reframe the $textbfQ$uestion $textbfA$rogrogering task.
arXiv Detail & Related papers (2023-05-23T16:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.