FrugalGPT: How to Use Large Language Models While Reducing Cost and
Improving Performance
- URL: http://arxiv.org/abs/2305.05176v1
- Date: Tue, 9 May 2023 05:11:02 GMT
- Title: FrugalGPT: How to Use Large Language Models While Reducing Cost and
Improving Performance
- Authors: Lingjiao Chen and Matei Zaharia and James Zou
- Abstract summary: We review the cost associated with querying popular large language models (LLMs)
We discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs.
Experiments show that FrugalGPT can match the performance of the best individual LLM with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost.
- Score: 36.94826820536239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a rapidly growing number of large language models (LLMs) that users
can query for a fee. We review the cost associated with querying popular LLM
APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have
heterogeneous pricing structures, with fees that can differ by two orders of
magnitude. In particular, using LLMs on large collections of queries and text
can be expensive. Motivated by this, we outline and discuss three types of
strategies that users can exploit to reduce the inference cost associated with
using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As
an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM
cascade which learns which combinations of LLMs to use for different queries in
order to reduce cost and improve accuracy. Our experiments show that FrugalGPT
can match the performance of the best individual LLM (e.g. GPT-4) with up to
98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost.
The ideas and findings presented here lay a foundation for using LLMs
sustainably and efficiently.
Related papers
- Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries.
We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT)
LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z) - LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs? [2.7820774076399957]
We compare effects of recent LLM augmentation methods with established ones on 6 datasets, 3 classifiers and 2 fine-tuning methods.
We show that LLM-based methods are worthy of deployment only when very small number of seeds is used.
arXiv Detail & Related papers (2024-08-29T13:01:42Z) - Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands.
Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs.
In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Cache me if you Can: an Online Cost-aware Teacher-Student framework to
Reduce the Calls to Large Language Models [13.799197575126442]
Small and medium-sized enterprises (SMEs) cannot afford the cost of creating large task-specific training datasets.
Third-party services that allow them to prompt Large Language Models currently require a payment per call.
We propose a framework that allows reducing the calls to LLMs by caching previous responses and using them to train a local inexpensive model.
arXiv Detail & Related papers (2023-10-20T10:05:07Z) - Large Language Model Cascades with Mixture of Thoughts Representations
for Cost-efficient Reasoning [19.472937476936636]
Large language models (LLMs) have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services.
In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs.
Our proposed cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
arXiv Detail & Related papers (2023-10-04T18:21:17Z) - EcoAssistant: Using LLM Assistant More Affordably and Accurately [36.29735258966917]
We contribute a framework, EcoAssistant, that enables Large language models to answer code-driven queries more affordably and accurately.
First, it allows the LLM assistants to converse with an automatic code executor to iteratively refine code or to produce answers based on the execution results.
Second, we use a hierarchy of LLM assistants, which attempts to answer the query with weaker, cheaper LLMs before backing off to stronger, expensive ones.
arXiv Detail & Related papers (2023-10-03T22:16:13Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.