Fugu-MT 論文翻訳(概要): Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

論文の概要: Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

arxiv url: http://arxiv.org/abs/2605.07111v1
Date: Fri, 08 May 2026 01:38:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.724058
Title: Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation
Title（参考訳）: LoRA vs. Full Fine-Tuning:LLM適応のためのグラディエントガイド最適化ルーティング
Authors: Haozhan Tang, Xiuqi Zhu, Xinyin Zhang, Boxun Li, Virginia Smith, Kevin Kuo,
Abstract要約: フルファインチューニング(FFT)は、高エントロピー知識注入に必要な表現塑性を提供する。 Low-Rank Adaptation (LoRA)は、多くのタスクはローランク空間の更新とLoRAの追加正規化の恩恵しか必要としないため、FFTのパフォーマンスに適合または超越することができる。両トレーニング体制間の連続的なナビゲーションを可能にする統合フレームワークであるLoRAとFull (MoLF) Fine-Tuningを提案する。
参考スコア（独自算出の注目度）: 15.4865294569737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for high-entropy knowledge injection, Low-Rank Adaptation (LoRA) can match or surpass FFT performance because many tasks only require updates in a low-rank space and benefit from LoRA's additional regularization. Through empirical evaluation across diverse tasks (SQL, Medical QA, and Counterfactual Knowledge) and varying language models (Gemma-3-1B, Qwen2.5-1.5B, and Qwen2.5-3B), we verify both trends and demonstrate that relying solely on either static architecture is structurally limited. To address this challenge, we propose a Mixture of LoRA and Full (MoLF) Fine-Tuning, a unified framework that enables continuous navigation between both training regimes. MoLF dynamically routes updates between FFT and LoRA at the optimizer level to ensure that exact gradient signals are available to both experts throughout training, yielding stable training dynamics. For memory-constrained environments, we also introduce MoLF-Efficient, which freezes base weights and only routes updates among a pair of LoRA experts of potentially varying rank. Our evaluations show that MoLF either improves on or stays within $1.5\%$ of the better of FFT and LoRA across all settings, while MoLF-Efficient outperforms prior adaptive LoRA approaches by up to $20\%$ on Fact and $9\%$ on Med and SQL.
Abstract（参考訳）: 微調整の大規模言語モデルに関する最近の文献は、根本的な議論を浮き彫りにしている。 Full Fine-Tuning (FFT) は高エントロピーの知識注入に必要な表現可塑性を提供するが、ローランク適応 (LoRA) は低ランク空間での更新とLoRAの追加正規化の恩恵により多くのタスクが要求されるため、FFT性能に適合または超えることができる。各種タスク(SQL, 医療QA, 対実知識)および様々な言語モデル(Gemma-3-1B, Qwen2.5-1.5B, Qwen2.5-3B)の実証評価を通じて, どちらの傾向も検証し, 静的アーキテクチャのみに依存することが構造的に制限されていることを示す。この課題に対処するために,両トレーニング体制間の継続的なナビゲーションを可能にする統合フレームワークであるLoRAとFull(MoLF)ファインチューニングの混合を提案する。 MoLFはFFTとLoRAの更新をオプティマイザレベルで動的にルーティングし、トレーニングを通じて両方の専門家に正確な勾配信号が利用可能であることを保証し、安定したトレーニングダイナミクスを提供する。メモリ制約のある環境では、ベースウェイトを凍結するMOLF-Efficientも導入します。評価の結果,MoLF は FFT と LoRA のすべての設定において 1.5 % の範囲内にあるか,あるいは 1.5 % の範囲内にあることが明らかとなった。

論文の概要: Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

関連論文リスト