Related papers: Reverse Thinking Makes LLMs Stronger Reasoners

Reverse Thinking Makes LLMs Stronger Reasoners

URL: http://arxiv.org/abs/2411.19865v2
Date: Fri, 07 Mar 2025 20:33:35 GMT
Title: Reverse Thinking Makes LLMs Stronger Reasoners
Authors: Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister,
Abstract summary: RevThink is a framework composed of data augmentation and learning objectives.<n> Experiments across 12 datasets show an average 13.53% improvement over the student model's zero-shot performance.<n>RevThink also exhibits strong generalization to out-of-distribution held-out datasets.
Score: 90.42357659849215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning performance as it enables consistency checks between their forward and backward thinking. To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. In RevThink, we augment the dataset by collecting structured forward-backward reasoning from a teacher model, consisting of: (1) the original question, (2) forward reasoning, (3) backward question, and (4) backward reasoning. We then employ three objectives to train a smaller student model in a multi-task learning fashion: (a) generate forward reasoning from a question, (b) generate a backward question from a question, and (c) generate backward reasoning from the backward question. Experiments across 12 datasets covering commonsense, math, and logical reasoning show an average 13.53% improvement over the student model's zero-shot performance and a 6.84% improvement over the strongest knowledge distillation baselines. Moreover, our method demonstrates sample efficiency -- using only 10% of the correct forward reasoning from the training data, it outperforms a standard fine-tuning method trained on 10x more forward reasoning. RevThink also exhibits strong generalization to out-of-distribution held-out datasets.

Related papers

SEAL: Steerable Reasoning Calibration of Large Language Models for Free [58.190800043449336]
Large Language Models (LLMs) have demonstrated compelling capabilities for complex reasoning tasks via the extended chain-of-thought (CoT) reasoning mechanism. Recent studies reveal substantial redundancy in the CoT reasoning traces, which negatively impacts model performance. We introduce SEAL, a training-free approach that seamlessly calibrates the CoT process, improving accuracy while demonstrating significant efficiency gains.
arXiv Detail & Related papers (2025-04-07T02:42:07Z)
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning [84.2749507577386]
We introduce Retro-Search, an MCTS-inspired search algorithm, for distilling higher quality reasoning paths from large models. Retro-Search retrospectively revises reasoning paths to discover better, yet shorter traces, which can lead to student models with enhanced reasoning capabilities. Our approach can enable two use cases: self-improvement, where models are fine-tuned on their own Retro-Search-ed traces, and weak-to-strong improvement.
arXiv Detail & Related papers (2025-04-06T06:23:27Z)
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs [28.565225092457897]
Reinforcement learning can drive self-improvement in language models on verifiable tasks. We find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them.
arXiv Detail & Related papers (2025-03-03T08:46:22Z)
Evaluating Social Biases in LLM Reasoning [19.824838766883534]
This paper evaluated the 8B and 32B variants of DeepSeek-R1 against their instruction tuned counterparts on the BBQ dataset. To the best of our knowledge, this empirical study is the first to assess bias issues in LLM reasoning.
arXiv Detail & Related papers (2025-02-21T10:16:07Z)
Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs) We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales. Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z)
Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z)
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z)
Distilling Reasoning Ability from Large Language Models with Adaptive Thinking [54.047761094420174]
Chain of thought finetuning (cot-finetuning) aims to endow small language models (SLM) with reasoning ability to improve their performance towards specific tasks. Most existing cot-finetuning methods adopt a pre-thinking mechanism, allowing the SLM to generate a rationale before providing an answer. This mechanism enables SLM to analyze and think about complex questions, but it also makes answer correctness highly sensitive to minor errors in rationale. We propose a robust post-thinking mechanism to generate answers before rationale.
arXiv Detail & Related papers (2024-04-14T07:19:27Z)
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems [17.80128896525717]
backward reasoning is relatively unexplored. backward reasoning can be seen as the ''inverse'' of forward reasoning. We propose variations of three different forward reasoning strategies to improve performance.
arXiv Detail & Related papers (2023-10-03T12:03:06Z)
Forward-Backward Reasoning in Large Language Models for Mathematical Verification [65.9495774606273]
Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. We introduce backward reasoning to verify candidate answers. FOrward and BAckward Reasoning for verification achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-15T13:19:59Z)
REFINER: Reasoning Feedback on Intermediate Representations [47.36251998678097]
We introduce REFINER, a framework for finetuning language models to generate intermediate inferences. REFINER works by interacting with a critic model that provides automated feedback on the reasoning. Empirical evaluations show significant improvements over baseline LMs of comparable scale.
arXiv Detail & Related papers (2023-04-04T15:57:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.