Reverse Thinking Makes LLMs Stronger Reasoners
- URL: http://arxiv.org/abs/2411.19865v1
- Date: Fri, 29 Nov 2024 17:27:05 GMT
- Title: Reverse Thinking Makes LLMs Stronger Reasoners
- Authors: Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister,
- Abstract summary: RevThink is a framework composed of data augmentation and learning objectives.
Experiments across 12 datasets show an average 13.53% improvement over the student model's zero-shot performance.
RevThink also exhibits strong generalization to out-of-distribution held-out datasets.
- Score: 90.42357659849215
- License:
- Abstract: Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning performance as it enables consistency checks between their forward and backward thinking. To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. In RevThink, we augment the dataset by collecting structured forward-backward reasoning from a teacher model, consisting of: (1) the original question, (2) forward reasoning, (3) backward question, and (4) backward reasoning. We then employ three objectives to train a smaller student model in a multi-task learning fashion: (a) generate forward reasoning from a question, (b) generate a backward question from a question, and (c) generate backward reasoning from the backward question. Experiments across 12 datasets covering commonsense, math, and logical reasoning show an average 13.53% improvement over the student model's zero-shot performance and a 6.84% improvement over the strongest knowledge distillation baselines. Moreover, our method demonstrates sample efficiency -- using only 10% of the correct forward reasoning from the training data, it outperforms a standard fine-tuning method trained on 10x more forward reasoning. RevThink also exhibits strong generalization to out-of-distribution held-out datasets.
Related papers
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning [23.99454995087634]
We explore the potential of rule-based reinforcement learning in large reasoning models.
We use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification.
Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus.
arXiv Detail & Related papers (2025-02-20T17:49:26Z) - Toward Adaptive Reasoning in Large Language Models with Thought Rollback [33.714789952452094]
This paper proposes a new reasoning framework, called Thought Rollback (TR)
TR allows large language models (LLMs) to adaptively build thought structure while maintaining effective reasoning toward problem-solving under hallucinations''
arXiv Detail & Related papers (2024-12-27T16:02:34Z) - Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs)
We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales.
Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z) - Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths.
Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance.
We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z) - Implicit Chain of Thought Reasoning via Knowledge Distillation [58.80851216530288]
Instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning.
We find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
arXiv Detail & Related papers (2023-11-02T17:59:49Z) - Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems [17.80128896525717]
backward reasoning is relatively unexplored.
backward reasoning can be seen as the ''inverse'' of forward reasoning.
We propose variations of three different forward reasoning strategies to improve performance.
arXiv Detail & Related papers (2023-10-03T12:03:06Z) - Forward-Backward Reasoning in Large Language Models for Mathematical Verification [65.9495774606273]
Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting.
We introduce backward reasoning to verify candidate answers.
FOrward and BAckward Reasoning for verification achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-15T13:19:59Z) - Deep Feedback Inverse Problem Solver [141.26041463617963]
We present an efficient, effective, and generic approach towards solving inverse problems.
We leverage the feedback signal provided by the forward process and learn an iterative update model.
Our approach does not have any restrictions on the forward process; it does not require any prior knowledge either.
arXiv Detail & Related papers (2021-01-19T16:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.