Synthetic Error Injection Fails to Elicit Self-Correction In Language Models
- URL: http://arxiv.org/abs/2512.02389v1
- Date: Tue, 02 Dec 2025 03:57:49 GMT
- Title: Synthetic Error Injection Fails to Elicit Self-Correction In Language Models
- Authors: David X. Wu, Shreyas Kapur, Anant Sahai, Stuart Russell,
- Abstract summary: We investigate whether supervised learning with synthetic error injection can induce self-correction abilities in language models.<n>Our approach inserts artificial errors into reasoning chains, masks them, and supervises the model to recognize and correct these mistakes.<n>Our results help explain why on-policy reinforcement learning methods have proven uniquely effective for eliciting self-correction.
- Score: 14.76894432271754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has become the dominant paradigm for eliciting reasoning and self-correction capabilities in large language models, but its computational expense motivates exploration of alternatives. Inspired by techniques from autonomous driving and robotics, we investigate whether supervised learning with synthetic error injection can induce self-correction abilities in language models. Our approach inserts artificial errors into reasoning chains, masks them, and supervises the model to recognize and correct these mistakes. Despite the intuitive appeal of this method, we find that it fails to significantly improve performance even on simple synthetic tasks across multiple models. Moreover, even when the model catches its own error, it often parrots the original mistake. We find that the distribution shift of synthetic errors to on-policy errors significantly degrades the error-correction capabilities of the fine-tuned model, even with good synthetic coverage of on-policy errors. Our results help explain why on-policy reinforcement learning methods have proven uniquely effective for eliciting self-correction.
Related papers
- From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model [72.73512218682187]
We introduce ReDiff, a refining-enhanced diffusion framework that teaches the model to identify and correct its own errors.<n>Our approach features a two-stage training process: first, we instill a foundational revision capability by training the model to revise synthetic errors; second, we implement a novel online self-correction loop.<n>This mistake-driven learning endows the model with the crucial ability to revisit and refine its already generated output, effectively breaking the error cascade.
arXiv Detail & Related papers (2025-10-22T06:58:55Z) - Error Reflection Prompting: Can Large Language Models Successfully Understand Errors? [8.4909975287531]
Chain-of-thought (CoT) methods aim to equip models with a better understanding of the correct procedures for addressing a given task.<n>We propose Error Reflection Prompting (ERP) to further enhance reasoning in language models.
arXiv Detail & Related papers (2025-08-22T18:02:36Z) - Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning [4.768151813962547]
Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities.<n>Their performance remains brittle to minor variations in problem description and prompting strategy.<n>To better understand self-correction capabilities of recent models, we conduct experiments measuring models' ability to self-correct synthetics.
arXiv Detail & Related papers (2025-06-18T21:35:44Z) - Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)<n>RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation.<n>Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth.
We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.
We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z) - Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse
Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot.
We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z) - Grammatical Error Generation Based on Translated Fragments [0.0]
We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction.
Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language.
arXiv Detail & Related papers (2021-04-20T12:43:40Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.