Fugu-MT 論文翻訳(概要): The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

論文の概要: The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

arxiv url: http://arxiv.org/abs/2606.06920v1
Date: Fri, 05 Jun 2026 05:34:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-08 14:33:29.579384
Title: The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning
Title（参考訳）: Sub-1Bの数学的推論における否定的伝達の評価とPEFTの役割
Authors: Rahul Nair, Chun Tao,
Abstract要約: フルファインチューニング(Full FT)は、300Mパラメータ以下のモデルの性能を積極的に損なう。また,Low-Rank Adaptation (LoRA) とWeight-Decomposed LoRA (DoRA) を比較検討したところ,その強度はタスクによって異なることがわかった。 5M未満のアーキテクチャでは,すべてのアライメントサブ1Bモデルに対してPEFTをデフォルトとし,Full FTに対して警告する。
参考スコア（独自算出の注目度）: 8.166960747155136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying Small Language Models (SLMs) on edge devices requires efficient fine-tuning strategies that adapt models to new tasks without degrading their general capabilities. In this study, we benchmark five sub-1B models (135M-1B) on mathematical reasoning tasks and uncover a critical vulnerability: Full Fine-Tuning (Full FT) actively harms performance in models under 300M parameters, often dropping accuracy below zero-shot baselines. This "negative transfer" makes Parameter-Efficient Fine-Tuning (PEFT) not just an efficiency preference, but a stability requirement. We find that while Low-Rank Adaptation (LoRA) and Weight-Decomposed LoRA (DoRA) perform comparably, their strengths vary by task; DoRA excels in complex reasoning (GSM8K), while LoRA dominates pattern matching (OrcaMath). In particular, Full FT is outperformed by LoRA on aligned models (Qwen2.5-0.5B) and even by simple 5-shot In-Context Learning on the smallest architectures (SmolLM2-135M). Based on these findings, we recommend defaulting to PEFT for all aligned sub-1B models and caution against Full FT for any architecture smaller than 500M parameters to prevent catastrophic forgetting. Reproduction of this work can be found at https://github.com/gulguluu/tiny-slm-finetune-compare.
Abstract（参考訳）: エッジデバイスに小さな言語モデル(SLM)をデプロイするには、一般的な能力を低下させることなく、新しいタスクにモデルを適応させる効率的な微調整戦略が必要である。本研究では,5つのサブ1Bモデル (135M-1B) を数学的推論タスクでベンチマークし,致命的な脆弱性を明らかにする。この「負の転送」は、パラメータ効率の良いファインチューニング(PEFT)を効率の優先だけでなく、安定性の要件にしている。また,Low-Rank Adaptation (LoRA) とWeight-Decomposed LoRA (DoRA) が両立可能であるのに対して,DoRA は複雑な推論(GSM8K) に優れ,LoRA がパターンマッチング (OrcaMath) を支配していることがわかった。特にフルFTは、整列モデル(Qwen2.5-0.5B)でLoRAより優れており、最小アーキテクチャ(SmolLM2-135M)で単純な5ショットのインコンテキスト学習でも優れている。これらの結果に基づき, PEFT を全 PEFT モデルに対してデフォルトにすることを推奨し, 5M 未満のアーキテクチャに対して全 FT に対して警告を行い, 破滅的な忘れ込みを防止することを推奨する。この作業の再現はhttps://github.com/gulguluu/tiny-slm-finetune-compareで見ることができる。

論文の概要: The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

関連論文リスト