Related papers: ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

URL: http://arxiv.org/abs/2410.02108v1
Date: Thu, 3 Oct 2024 00:09:15 GMT
Title: ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Authors: Xiangyu Peng, Congying Xia, Xinyi Yang, Caiming Xiong, Chien-Sheng Wu, Chen Xing,
Abstract summary: Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. Existing self-synthesizing methods suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data.
Score: 70.09541267910974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.

Related papers

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z)
When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents [2.689316553293938]
Supervised fine-tuning (SFT) has emerged as one of the most effective ways to improve the performance of large language models (LLMs) in downstream tasks.<n>We propose a pipeline in which LLMs generate reasoning steps that guide both the invocation of tools and the final answer generation for conversational agents.
arXiv Detail & Related papers (2025-12-12T04:44:40Z)
From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs [33.17712742134723]
We propose a framework that shifts from implicit exploration to structured reasoning through guideline and refinement.<n>First, we extract structured reasoning patterns from successful trajectories and reflective signals from failures.<n>During inference, the model follows these guidelines step-by-step, with refinement applied after each step to correct errors and stabilize the reasoning process.
arXiv Detail & Related papers (2025-09-08T02:11:49Z)
Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models [0.0]
Large Language Models (LLMs) have demonstrated remarkable capabilities at solving complex reasoning tasks with Chain-of-Thought prompting.<n>We introduce textbfinverse reasoning, a novel paradigm enabling LLMs to decompose and explain their own reasoning chains post-hoc.<n>Our work creates new avenues for transparent AI systems and closes significant gaps in AI safety, education, and scientific discovery.
arXiv Detail & Related papers (2025-06-30T09:53:41Z)
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs [54.154593699263074]
ProtoReasoning is a framework that enhances the reasoning ability of Large Reasoning Models.<n>ProtoReasoning transforms problems into corresponding prototype representations.<n>ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning.
arXiv Detail & Related papers (2025-06-18T07:44:09Z)
Self-Training Large Language Models with Confident Reasoning [15.260831996769962]
Large language models (LLMs) have shown impressive performance by generating reasoning paths before final answers.<n>We propose a new self-training method, CORE-PO, that fine-tunes LLMs to prefer high-COnfidence REasoning paths through Policy Optimization.<n>Our experiments show that CORE-PO improves the accuracy of outputs on four in-distribution and two out-of-distribution benchmarks, compared to existing self-training methods.
arXiv Detail & Related papers (2025-05-23T04:25:10Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question. To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA) Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z)
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning [41.9992614617405]
We introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks.
arXiv Detail & Related papers (2024-10-01T20:05:51Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning [29.514755268807868]
We propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning. Our experiments show that SEER significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-24T06:10:51Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning [14.663216851932646]
We show that language models tend to perform fairly well at single step inference tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. We propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules. We show that a 7B parameter LLM used within the SI framework in a 5-shot generalisation setting, with no fine-tuning, yields a performance improvement of over 100%.
arXiv Detail & Related papers (2022-05-19T17:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.