Fugu-MT 論文翻訳(概要): Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction

論文の概要: Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction

arxiv url: http://arxiv.org/abs/2508.13037v1
Date: Mon, 18 Aug 2025 15:56:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.467658
Title: Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction
Title（参考訳）: 大規模モデルは、人間のような数学的問題を解くために学生モデルを教えることができるか? : マルチロラ相互作用による推論的蒸留法
Authors: Xinhe Li, Jiajun Liu, Peng Wang,
Abstract要約: 大規模言語モデル(LLM)は強力な数学的推論能力を持つが、数十億のパラメータに依存している。既存の手法は通常、LCMを利用して大量のデータを生成してクラミングトレーニングを行う。数学的推論蒸留(LoRID)のためのマルチロラ相互作用に基づく新しい手法を提案する。 LoRIDは、特にGSM8Kデータセット上で、最先端のパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 6.695255921627406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have demonstrated that Large Language Models (LLMs) have strong mathematical reasoning abilities but rely on hundreds of billions of parameters. To tackle the challenge of poor reasoning in Small Language Models (SLMs), existing methods typically leverage LLMs to generate massive amounts of data for cramming training. In psychology, they are akin to System 1 thinking, which resolves reasoning problems rapidly based on experience and intuition. However, human learning also requires System 2 thinking, where knowledge is first acquired and then reinforced through practice. Inspired by such two distinct modes of thinking, we propose a novel method based on the multi-LoRA Interaction for mathematical reasoning Distillation (LoRID). First, we input the question and reasoning of each sample into an LLM to create knowledge-enhanced datasets. Subsequently, we train a LoRA block on the student model as an Intuitive Reasoner (IR), which directly generates Chain-of-Thoughts for problem-solving. Then, to imitate System 2 thinking, we train the Knowledge Generator (KG) and Deep Reasoner (DR), respectively. The former outputs only knowledge after receiving problems, while the latter uses that knowledge to perform reasoning. Finally, to address the randomness in the generation of IR and DR, we evaluate whether their outputs are consistent, and the inference process needs to be iterated if not. This step can enhance the mathematical reasoning ability of SLMs through mutual feedback. Experimental results show that LoRID achieves state-of-the-art performance, especially on the GSM8K dataset, where it outperforms the second-best method by 2.3%, 16.1%, 2.4%, 12.3%, and 1.8% accuracy across the five base models, respectively.
Abstract（参考訳）: 近年の研究では、Large Language Models (LLM) は強力な数学的推論能力を持つが、数十億のパラメータに依存することが示されている。 SLM(Small Language Models)における推論の難しさに対処するため、既存の手法ではLLMを利用して大量のデータを生成してクラミングトレーニングを行うのが一般的である。心理学において、それらはシステム1の思考に似ており、経験と直観に基づいて推論問題を迅速に解決する。しかし、人間の学習にはシステム2の思考も必要であり、そこでは知識が最初に獲得され、実践を通じて強化される。このような2つの異なる考え方から着想を得て,数理推論蒸留(LoRID)のためのマルチロラ相互作用に基づく新しい手法を提案する。まず、各サンプルの質問と推論をLLMに入力し、知識に富んだデータセットを作成する。その後、学生モデル上のLoRAブロックを直観的推論(IR)として訓練し、問題解決のためのChain-of-Thoughtsを直接生成する。次に,システム2の思考を模倣するために,知識発生器(KG)と深部推論器(DR)をそれぞれ訓練する。前者は問題を受けた後にのみ知識を出力し、後者は推論を行うためにその知識を使用する。最後に、IR と DR の生成におけるランダム性に対処するために、それらの出力が一貫したものであるかどうかを評価し、もしそうでなければ推論プロセスを繰り返す必要がある。このステップは相互フィードバックによってSLMの数学的推論能力を高めることができる。実験の結果、特にGSM8Kデータセットにおいて、LoRIDは最先端のパフォーマンスを達成し、5つのベースモデルでそれぞれ2.3%、16.1%、2.4%、12.3%、および1.8%の精度で2番目のベストメソッドを上回った。

論文の概要: Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction

関連論文リスト