Related papers: Not the Example, but the Process: How Self-Generated Examples Enhance LLM Reasoning

Not the Example, but the Process: How Self-Generated Examples Enhance LLM Reasoning

URL: http://arxiv.org/abs/2602.15863v1
Date: Mon, 26 Jan 2026 10:28:52 GMT
Title: Not the Example, but the Process: How Self-Generated Examples Enhance LLM Reasoning
Authors: Daehoon Gwak, Minseo Jung, Junwoo Park, Minho Park, ChaeHun Park, Junha Hyung, Jaegul Choo,
Abstract summary: We argue that the key benefit arises not from the generated examples themselves but from the act of creating them.<n>We evaluate three prompting strategies for in-context learning: Zero-shot prompting, Integrated prompting, and Decoupled prompting.<n>We conclude that the advantage of self-generation prompting comes from the process of problem creation, not the examples themselves.
Score: 44.55861996331439
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have shown that Large Language Models (LLMs) can improve their reasoning performance through self-generated few-shot examples, achieving results comparable to manually curated in-context examples. However, the underlying mechanism behind these gains remains unclear, making it hard to decide when and how to apply the technique effectively. In this work, we argue that the key benefit arises not from the generated examples themselves but from the act of creating them. To validate this, on reasoning-intensive tasks across diverse LLM architectures, we systematically evaluate three prompting strategies for in-context learning: (1) Zero-shot prompting; (2) Integrated prompting, where LLMs create and solve problems within a single, unified prompt; and (3) Decoupled prompting, where self-generated examples are reused as in-context examples, but the context of their creation itself is excluded. We conduct experiments across five widely used model architectures, demonstrating that Integrated prompting consistently outperforms both Zero-shot and Decoupled prompting. In contrast, Decoupled prompting offers only marginal gains over Zero-shot. Further, for a more in-depth analysis, we conduct an attention analysis and observe significant differences in attention patterns between Integrated and Decoupled prompting. These findings suggest that the advantage of self-generation prompting comes from the process of problem creation, not the examples themselves, providing valuable insights for designing more effective prompting strategies.

Related papers

From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs [58.02809208460186]
We revisit this paradox using high-quality traces from DeepSeek-R1 as demonstrations.<n>We find that adding more exemplars consistently degrades accuracy, even when demonstrations are optimal.<n>We introduce Insight-to-solve (I2S), a sequential test-time procedure that turns demonstrations into explicit, reusable insights.
arXiv Detail & Related papers (2025-09-27T08:59:31Z)
TORSO: Template-Oriented Reasoning Towards General Tasks [23.681707595200265]
We introduce template-Oriented Reasoning (TORSO), which elicits the model to utilize internal reasoning abilities to generate proper responses across various tasks without the need for manually crafted few-shot examples.<n>Our experimental results demonstrate that TORSO achieves strong performance on diverse LLMs benchmarks with reasonable rationales.
arXiv Detail & Related papers (2025-09-11T13:31:35Z)
ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models [76.28894983518164]
Small Language Models (SLMs) are a cost-effective alternative to Large Language Models (LLMs)<n>They often struggle with complex reasoning due to their limited capacity and a tendency to produce mistakes or inconsistent answers.<n>We introduce ReaLM, a reinforcement learning framework for robust and self-sufficient reasoning in vertical domains.
arXiv Detail & Related papers (2025-08-17T14:50:23Z)
Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following [10.119219532863767]
lazy reasoning during the thinking stage is the primary factor contributing to poor instruction adherence.<n>We propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking.<n>Our Light-IF-32B model surpasses both larger open-source models such as DeepSeek-R1 and closed-source models like Doubao-1.6.
arXiv Detail & Related papers (2025-08-05T07:42:00Z)
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs [19.354141845315276]
Chain-of-thought reasoning can significantly degrade instruction-following accuracy.<n>This is the first work to systematically expose reasoning-induced failures in instruction-following.
arXiv Detail & Related papers (2025-05-16T16:36:00Z)
What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance.<n>We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z)
Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning [13.381974811214764]
Reasoning Graph-enhanced Exemplar Retrieval (RGER)<n>RGER uses graph kernel to select exemplars with semantic and structural similarity.<n>Our code is released at https://github.com/Yukang-Lin/RGER.
arXiv Detail & Related papers (2024-09-17T12:58:29Z)
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning? [44.158548608820624]
We show that self-generated examples can achieve comparable or even better performance on certain tasks.<n>We find that the accuracy of self-generated examples is the key factor and subsequently design two novel methods.
arXiv Detail & Related papers (2024-04-19T09:15:07Z)
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z)
Paired Examples as Indirect Supervision in Latent Decision Models [109.76417071249945]
We introduce a way to leverage paired examples that provide stronger cues for learning latent decisions. We apply our method to improve compositional question answering using neural module networks on the DROP dataset.
arXiv Detail & Related papers (2021-04-05T03:58:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.