Related papers: Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

URL: http://arxiv.org/abs/2205.10625v3
Date: Sun, 16 Apr 2023 22:08:08 GMT
Title: Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Authors: Denny Zhou, Nathanael Sch\"arli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi
Abstract summary: We propose a novel prompting strategy, least-to-most prompting, to overcome the challenge of easy-to-hard generalization. We show that least-to-most prompting is capable of generalizing to more difficult problems than those seen in prompts. neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples.
Score: 52.59923418570378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.

Related papers

Frontier LLMs Still Struggle with Simple Reasoning Tasks [53.497499123166804]
This work studies the performance of frontier language models on a broad set of "easy" reasoning problems.<n>We create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning.<n>We show that even state-of-the-art thinking models consistently fail on such problems and for similar reasons.
arXiv Detail & Related papers (2025-07-09T22:22:49Z)
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach [4.055489363682199]
We conduct the first systematic study of the relationship between reasoning length and model performance. We show that this tradeoff persists across even very distinct reasoning chains. We show that prompt-based compression strategies operate far from theoretical limits.
arXiv Detail & Related papers (2025-03-03T03:48:20Z)
Why is prompting hard? Understanding prompts on binary sequence predictors [19.855572748273236]
Large language models (LLMs) can be prompted to do many tasks. Finding good prompts is not always easy, nor is understanding some performant prompts.
arXiv Detail & Related papers (2025-02-15T10:55:47Z)
Task Facet Learning: A Structured Approach to Prompt Optimization [13.423478909210353]
We propose an algorithm that learns multiple facets of a task from a set of training examples.<n>The proposed algorithm, UniPrompt, clusters the input space and uses clustered batches so that each batch likely corresponds to a different facet of the task.<n> Empirical evaluation on multiple datasets and a real-world task shows that prompts generated using shortname obtain higher accuracy than human-tuned prompts and those from state-of-the-art methods.
arXiv Detail & Related papers (2024-06-15T04:54:26Z)
Chain of Thoughtlessness? An Analysis of CoT in Planning [17.329365493094542]
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain. We find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class.
arXiv Detail & Related papers (2024-05-08T02:48:28Z)
An Examination on the Effectiveness of Divide-and-Conquer Prompting in Large Language Models [28.139780691709266]
We provide a theoretic analysis to divide-and-conquer prompting strategy and help us identify the specific tasks where DaC prompting can bring performance boost with theoretic guarantee. We present two cases (large integer arithmetic and fact verification) where experimental results align with our theoretic analysis.
arXiv Detail & Related papers (2024-02-08T02:37:30Z)
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting. We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning. A central question is which reasoning examples make the most effective prompts. We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.