Take a Step Back: Evoking Reasoning via Abstraction in Large Language
Models
- URL: http://arxiv.org/abs/2310.06117v2
- Date: Tue, 12 Mar 2024 04:38:27 GMT
- Title: Take a Step Back: Evoking Reasoning via Abstraction in Large Language
Models
- Authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed
H. Chi, Quoc V Le and Denny Zhou
- Abstract summary: Step-Back Prompting enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details.
Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution.
- Score: 122.19845578690466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Step-Back Prompting, a simple prompting technique that enables
LLMs to do abstractions to derive high-level concepts and first principles from
instances containing specific details. Using the concepts and principles to
guide reasoning, LLMs significantly improve their abilities in following a
correct reasoning path towards the solution. We conduct experiments of
Step-Back Prompting with PaLM-2L, GPT-4 and Llama2-70B models, and observe
substantial performance gains on various challenging reasoning-intensive tasks
including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back
Prompting improves PaLM-2L performance on MMLU (Physics and Chemistry) by 7%
and 11% respectively, TimeQA by 27%, and MuSiQue by 7%.
Related papers
- A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics [9.681821524089761]
We present a survey of strategies utilizing feedback at the step and outcome levels to enhance multi-step math reasoning for LLMs.
As multi-step reasoning emerges a crucial component in scaling LLMs, we hope to establish its foundation for easier understanding and empower further research.
arXiv Detail & Related papers (2025-02-20T07:31:00Z) - LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [103.0226977561914]
We propose a comprehensive framework for advancing step-by-step visual reasoning in large language models.
We introduce a visual reasoning benchmark specifically designed to evaluate multi-step reasoning tasks.
Second, we propose a novel metric that assesses visual reasoning quality at the granularity of individual steps.
Third, we present a new multimodal visual reasoning model, named LlamaV-o1, trained using a multi-step curriculum learning approach.
arXiv Detail & Related papers (2025-01-10T18:59:51Z) - BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning [83.03531832811386]
BoostStep is a method that enhances reasoning accuracy through step-aligned ICL examples.
It integrates seamlessly with chain-of-thought (CoT) and tree search algorithms.
It improves DeepSeek-R1-671B's performance on AIME by 2.2%, leveraging simple examples only from the MATH dataset.
arXiv Detail & Related papers (2025-01-06T18:59:13Z) - Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation.
Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs.
Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Reinforcement Learning Problem Solving with Large Language Models [0.0]
Large Language Models (LLMs) have an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of Natural Language Processing (NLP) tasks.
This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems.
We show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake"
arXiv Detail & Related papers (2024-04-29T12:16:08Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Large Language Models are Zero-Shot Reasoners [28.6899375595088]
Chain of thought (CoT) prompting is a technique for eliciting complex multi-step reasoning through step-by-step answer examples.
We show that LLMs are decent zero-shot reasoners by simply adding Let's think step by step'' before each answer.
Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances.
arXiv Detail & Related papers (2022-05-24T09:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.