Fugu-MT 論文翻訳(概要): Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

論文の概要: Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

arxiv url: http://arxiv.org/abs/2605.16470v1
Date: Fri, 15 May 2026 12:26:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:46.500777
Title: Strategic Over-Parameterization for Generalizable Low-Rank Adaptation
Title（参考訳）: 一般化可能な低ランク適応のための戦略的過パラメータ化
Authors: Jing Gao, Zhong-Yi Lu, Pan Zhang, Ze-Feng Gao,
Abstract要約: 大規模言語モデルへの完全な微調整による下流タスクへの適応は、計算とメモリの要求のため、ますます現実的ではない。 LoRA-Overは単純な原則に基づくフレームワークで、トレーニング中に最適化の風景を豊かにし、推論時に豊かにする。 LLaMA 2-7B と LLaMA 3.1-8B を用いた言語理解 (GLUE, T5-Base), 対話 (MT-Bench), 算術的推論 (GSM8K), コード生成 (HumanEval) に基づく LoRA-Over の評価を行った。
参考スコア（独自算出の注目度）: 14.867641913391779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapting large language models (LLMs) to downstream tasks via full fine-tuning is increasingly impractical due to its computational and memory demands. Parameter-efficient fine-tuning (PEFT) approaches such as Low-Rank Adaptation (LoRA) mitigate this by confining updates to a compact set of trainable parameters, but this aggressive reduction often sacrifices generalization, especially under transfer across heterogeneous tasks and domains. We revisit the tension between parameter efficiency and adaptation capacity, and ask whether the two are truly at odds. We answer in the negative by introducing LoRA-Over, a framework grounded in a simple principle: enrich the optimization landscape during training, then collapse the enrichment at inference. LoRA-Over injects auxiliary parameters into the low-rank adapters during training to broaden the effective hypothesis space, and through a decomposition-based reformulation folds them back into a standard low-rank structure with negligible reconstruction error, keeping inference cost identical to vanilla LoRA. Since not all weight matrices benefit equally from added capacity, we further propose two scheduling strategies, one statically predefined and one dynamically determined at runtime, that direct extra capacity where most needed. We evaluate LoRA-Over on language understanding (GLUE, T5-Base), dialogue (MT-Bench), arithmetic reasoning (GSM8K), and code generation (HumanEval), using LLaMA 2-7B and LLaMA 3.1-8B. Across all benchmarks and scales, LoRA-Over consistently outperforms vanilla LoRA, showing that principled over-parameterization designed to vanish at inference is an effective lever for improving PEFT generalization. Code will be released upon acceptance.
Abstract（参考訳）: 大規模言語モデル(LLM)を完全な微調整によって下流タスクに適応させることは、計算とメモリの要求のため、ますます現実的ではない。 Low-Rank Adaptation (LoRA) のようなパラメータ効率のよい微調整(PEFT)アプローチは、訓練可能なパラメータのコンパクトな集合に更新することでこれを緩和するが、この積極的な削減は一般化を犠牲にすることが多く、特に不均一なタスクやドメイン間での移動においてである。パラメータ効率と適応能力の緊張関係を再考し、両者が真に不一致であるかどうかを問う。 LoRA-Overは、トレーニング中に最適化ランドスケープを豊かにし、推論時にリッチ化を崩壊させるという、シンプルな原則に基づくフレームワークです。 LoRA-Overは、トレーニング中に低ランクアダプタに補助パラメータを注入し、有効仮説空間を拡大し、分解に基づく再構成により、それらを標準の低ランク構造に折り返し、無視可能な再構成誤差を持ち、バニラ・ロラと同一の推論コストを維持する。すべての重み行列が付加能力から等しく恩恵を受けるわけではないので、我々はさらに2つのスケジューリング戦略を提案している。 LLaMA 2-7B と LLaMA 3.1-8B を用いて,言語理解 (GLUE, T5-Base), 対話 (MT-Bench), 算術推論 (GSM8K), コード生成 (HumanEval) に基づく LoRA-Over の評価を行った。すべてのベンチマークとスケールで、LoRA-Overは一貫してバニラ・ロラよりも優れており、推論時に消滅するように設計された原則化された過パラメータ化はPEFTの一般化を改善する効果的なレバーであることを示している。コードは受理時にリリースされる。

論文の概要: Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

関連論文リスト