StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
- URL: http://arxiv.org/abs/2311.08803v4
- Date: Sat, 09 Nov 2024 13:52:50 GMT
- Title: StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
- Authors: Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam,
- Abstract summary: StrategyLLM allows LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts.
Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2% $rightarrow$ 38.8%), commonsense reasoning (70.3% $rightarrow$ 72.5%), algorithmic reasoning (73.7% $rightarrow$ 85.0
- Score: 76.5322280307861
- License:
- Abstract: Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% $\rightarrow$ 38.8\%), commonsense reasoning (70.3\% $\rightarrow$ 72.5\%), algorithmic reasoning (73.7\% $\rightarrow$ 85.0\%), and symbolic reasoning (30.0\% $\rightarrow$ 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.
Related papers
- HPSS: Heuristic Prompting Strategy Search for LLM Evaluators [81.09765876000208]
We propose a novel automatic prompting strategy optimization method called Heuristic Prompting Strategy Search (HPSS)
Inspired by the genetic algorithm, HPSS conducts an iterative search to find well-behaved prompting strategies for evaluators.
Extensive experiments across four evaluation tasks demonstrate the effectiveness of HPSS.
arXiv Detail & Related papers (2025-02-18T16:46:47Z) - Doing More with Less -- Implementing Routing Strategies in Large Language Model-Based Systems: An Extended Survey [1.430963201405577]
Large Language Models (LLM)-based systems rely on a single LLM for all user queries.
They often require different preprocessing strategies, levels of reasoning, or knowledge.
This paper explores key considerations for integrating routing into LLM-based systems.
arXiv Detail & Related papers (2025-02-01T12:08:38Z) - How Strategic Agents Respond: Comparing Analytical Models with LLM-Generated Responses in Strategic Classification [9.296248945826084]
We propose using strategic advice generated by large language models to simulate human agent responses in Strategic Classification.
We examine five critical SC scenarios -- hiring, loan applications, school admissions, personal income, and public assistance programs.
We then compare the resulting agent responses with the best responses generated by existing theoretical models.
arXiv Detail & Related papers (2025-01-20T01:39:03Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs)
MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task.
We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z) - Self-Guiding Exploration for Combinatorial Problems [2.636330943305939]
Self-Guiding Exploration (SGE) is designed to enhance the performance of solving Combinatorial Problems.
SGE operates autonomously, generating multiple thought trajectories for each CP task.
It then breaks these trajectories down into actionable subtasks, executes them sequentially, and refines the results to ensure optimal outcomes.
arXiv Detail & Related papers (2024-05-28T08:26:54Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - DRDT: Dynamic Reflection with Divergent Thinking for LLM-based
Sequential Recommendation [53.62727171363384]
We introduce a novel reasoning principle: Dynamic Reflection with Divergent Thinking.
Our methodology is dynamic reflection, a process that emulates human learning through probing, critiquing, and reflecting.
We evaluate our approach on three datasets using six pre-trained LLMs.
arXiv Detail & Related papers (2023-12-18T16:41:22Z) - Which is better? Exploring Prompting Strategy For LLM-based Metrics [6.681126871165601]
This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task.
Traditional similarity-based metrics such as BLEU and ROUGE have shown to misalign with human evaluation and are ill-suited for open-ended generation tasks.
arXiv Detail & Related papers (2023-11-07T06:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.