Related papers: Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

URL: http://arxiv.org/abs/2310.03094v3
Date: Thu, 8 Feb 2024 22:02:22 GMT
Title: Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Authors: Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao
Abstract summary: Large language models (LLMs) have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs. Our proposed cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
Score: 19.472937476936636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

Related papers

Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning [3.0449420665138485]
Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics. We propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning. Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.
arXiv Detail & Related papers (2025-02-26T23:22:02Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models. It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop. Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z)
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking [38.63721941742435]
We propose a novel cross-query-comparison-based method called $textitMeta Ranking$ (MR) MR assesses reliability by pairwisely ranking the target query-response pair with multiple reference query-response pairs. We show that MR can enhance strong LLMs' performance in two practical applications: model cascading and instruction tuning.
arXiv Detail & Related papers (2024-02-19T13:57:55Z)
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model. We employ a proxy model which has far fewer parameters, and take its answers as answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [29.447300472617826]
Think-on-Graph (ToG) is a new approach for external knowledge graphs (KG) in large language models (LLMs) ToG iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
arXiv Detail & Related papers (2023-07-15T03:31:38Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors [11.28397947587596]
Fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks. However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE) We propose QA4RE, a framework that aligns RE with question answering (QA), a predominant task in instruction-tuning datasets.
arXiv Detail & Related papers (2023-05-18T17:48:03Z)
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance [36.94826820536239]
We review the cost associated with querying popular large language models (LLMs) We discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs. Experiments show that FrugalGPT can match the performance of the best individual LLM with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost.
arXiv Detail & Related papers (2023-05-09T05:11:02Z)
Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! [43.51393135075126]
Large Language Models (LLMs) have made remarkable strides in various tasks. We show that current advanced LLMs consistently exhibit inferior performance, higher latency, and increased budget requirements compared to fine-tuned SLMs. We propose an adaptive filter-then-rerank paradigm to combine the strengths of LLMs and SLMs.
arXiv Detail & Related papers (2023-03-15T12:20:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.