Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction
- URL: http://arxiv.org/abs/2508.13037v1
- Date: Mon, 18 Aug 2025 15:56:10 GMT
- Title: Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction
- Authors: Xinhe Li, Jiajun Liu, Peng Wang,
- Abstract summary: Large Language Models (LLMs) have strong mathematical reasoning abilities but rely on hundreds of billions of parameters.<n>Existing methods typically leverage LLMs to generate massive amounts of data for cramming training.<n>We propose a novel method based on the multi-LoRA Interaction for mathematical reasoning Distillation (LoRID)<n>LoRID achieves state-of-the-art performance, especially on the GSM8K dataset.
- Score: 6.695255921627406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have demonstrated that Large Language Models (LLMs) have strong mathematical reasoning abilities but rely on hundreds of billions of parameters. To tackle the challenge of poor reasoning in Small Language Models (SLMs), existing methods typically leverage LLMs to generate massive amounts of data for cramming training. In psychology, they are akin to System 1 thinking, which resolves reasoning problems rapidly based on experience and intuition. However, human learning also requires System 2 thinking, where knowledge is first acquired and then reinforced through practice. Inspired by such two distinct modes of thinking, we propose a novel method based on the multi-LoRA Interaction for mathematical reasoning Distillation (LoRID). First, we input the question and reasoning of each sample into an LLM to create knowledge-enhanced datasets. Subsequently, we train a LoRA block on the student model as an Intuitive Reasoner (IR), which directly generates Chain-of-Thoughts for problem-solving. Then, to imitate System 2 thinking, we train the Knowledge Generator (KG) and Deep Reasoner (DR), respectively. The former outputs only knowledge after receiving problems, while the latter uses that knowledge to perform reasoning. Finally, to address the randomness in the generation of IR and DR, we evaluate whether their outputs are consistent, and the inference process needs to be iterated if not. This step can enhance the mathematical reasoning ability of SLMs through mutual feedback. Experimental results show that LoRID achieves state-of-the-art performance, especially on the GSM8K dataset, where it outperforms the second-best method by 2.3%, 16.1%, 2.4%, 12.3%, and 1.8% accuracy across the five base models, respectively.
Related papers
- Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability [25.507069397981194]
We show that it is possible to realize bi-level meta-RL that unlocks learning under sparse, binary rewards by sharpening a latent capacity of pretrained models to generate useful stepping stones.<n>Our results suggest that the ability to generate useful stepping stones does not require the ability to actually solve the hard problems.
arXiv Detail & Related papers (2026-01-26T18:46:56Z) - Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces [2.0789230137053014]
Test-time scaling has enabled a new class of Large Language Models (LLMs) that are able to reason through complex problems.<n>We compare the performance of medium-sized LLMs on Math problems after post-training on two kinds of reasoning traces.
arXiv Detail & Related papers (2025-11-24T17:26:58Z) - CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
Large language models (LLMs) benefit from increased test-time compute, a phenomenon known as test-time scaling.<n>However, reasoning-optimized models often overthink even simple problems, producing excessively verbose outputs and leading to low token efficiency.<n>We identify two key causes of this verbosity: (1) reinforcement learning reduces the information density of forward reasoning, and (2) backward chain-of thought training encourages redundant and often unnecessary verification steps.
arXiv Detail & Related papers (2025-05-28T06:24:45Z) - Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z) - Thinkless: LLM Learns When to Think [57.857534644932194]
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.<n>We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning.<n>On several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%.
arXiv Detail & Related papers (2025-05-19T17:24:16Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.<n>This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.<n>We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - System-2 Mathematical Reasoning via Enriched Instruction Tuning [13.672967091915181]
Enriched Instruction Tuning (EIT) is a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback.<n>EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods.
arXiv Detail & Related papers (2024-12-22T10:49:27Z) - Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review [11.756344944226495]
We introduce a novel Fault-Aware DistIllation via Peer-Review (FAIR) approach.<n>Instead of merely obtaining rationales from teachers, our method asks teachers to identify and explain the student's mistakes.<n>Our method reduces the chance of teachers guessing correctly with flawed rationale.
arXiv Detail & Related papers (2024-10-04T17:59:41Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - Exploring Equation as a Better Intermediate Meaning Representation for
Numerical Reasoning [53.2491163874712]
We use equations as IMRs to solve the numerical reasoning task.
We present a method called Boosting Numerical Reasontextbfing by Decomposing the Generation of Equations (Bridge)
Our method improves the performance by 2.2%, 0.9%, and 1.7% on GSM8K, SVAMP, and Algebra datasets.
arXiv Detail & Related papers (2023-08-21T09:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.