Rethinking Reflection in Pre-Training
- URL: http://arxiv.org/abs/2504.04022v1
- Date: Sat, 05 Apr 2025 02:24:07 GMT
- Title: Rethinking Reflection in Pre-Training
- Authors: Essential AI, :, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ashish Vaswani, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Anthony Polloreno, Ashish Tanwer, Burhan Drak Sibai, Divya S Mansingka, Divya Shivaprasad, Ishaan Shah, Karl Stratos, Khoi Nguyen, Michael Callahan, Michael Pust, Mrinal Iyer, Philip Monk, Platon Mazarakis, Ritvik Kapila, Saurabh Srivastava, Tim Romanski,
- Abstract summary: A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems.<n>We introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes.<n>By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time.
- Score: 13.372316326688722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
Related papers
- Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs [28.565225092457897]
Reinforcement learning can drive self-improvement in language models on verifiable tasks.<n>We find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown.<n>Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them.
arXiv Detail & Related papers (2025-03-03T08:46:22Z) - ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification.
Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z) - Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction [89.56181323849512]
SuperCorrect is a novel two-stage framework that uses a large teacher model to supervise and correct both the reasoning and reflection processes of a smaller student model.<n>In the first stage, we extract hierarchical high-level and detailed thought templates from the teacher model to guide the student model in eliciting more fine-grained reasoning thoughts.<n>In the second stage, we introduce cross-model collaborative direct preference optimization (DPO) to enhance the self-correction abilities of the student model.
arXiv Detail & Related papers (2024-10-11T17:25:52Z) - Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks [43.96835245022083]
Self-correction that instructs models to refine their outputs presents a promising solution to this issue.
This study investigates the self-correction capabilities of Vision-Language Models during both inference and fine-tuning stages.
arXiv Detail & Related papers (2024-10-05T06:28:54Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - A Theoretical Understanding of Self-Correction through In-context Alignment [51.622068973630796]
Large language models (LLMs) are capable of improving their abilities purely by self-correction.
We show that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way.
Inspired by these findings, we also illustrate applications of self-correction, such as defending against LLM jailbreaks.
arXiv Detail & Related papers (2024-05-28T22:33:02Z) - Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs)
This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z) - A study on the plasticity of neural networks [21.43675319928863]
We discuss the implication of losing plasticity for continual learning.
We show that a pretrained model on data from the same distribution as the one it is fine-tuned on might not reach the same generalisation as a freshly initialised model.
arXiv Detail & Related papers (2021-05-31T18:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.