Fugu-MT 論文翻訳(概要): GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

論文の概要: GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

arxiv url: http://arxiv.org/abs/2601.00231v1
Date: Thu, 01 Jan 2026 06:31:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-05 15:04:33.328807
Title: GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation
Title（参考訳）: GRIT -- K-FACPリコンディショニング、Fisher-Guided Reprojection、Dynamic Rank Adaptationを備えた幾何対応PEFT
Authors: Pritish Saha, Chandrav Rajbangshi, Rudra Goyal, Mohit Goyal, Anurag Deo, Biswajit Roy, Ningthoujam Dhanachandra Singh, Raxit Goswami, Amitava Das,
Abstract要約: GRITは、LoRAパラメータ化を保存する曲率対応のLoRAプロシージャである。トレーニング可能なパラメータを平均で46%削減しながら、LoRAとQLoRAにマッチまたは超越する。 GRITは強力なPEFT最適化器ベースラインよりも低いドリフトと更新vs保持フロンティアが得られる。
参考スコア（独自算出の注目度）: 4.748720471060117
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).
Abstract（参考訳）: パラメータ効率の微調整(PEFT)は LLM を適応するためのデフォルトの方法であるが、広く使われている LoRA と QLoRA は幾何に依存しない。これにより、効果的な更新予算を増大させ、制約の弱い方向に沿ってドリフトを増幅することができる。 GRITはロラのパラメータ化を保ちつつも,(1)K-FACを自然な段階的プロキシとして使用したランク空間の勾配を前提条件として,(2)低ランク基底を周期的にドリフトを抑えるために支配的なフィッシャー固有方向へと再計画し,(3)スペクトルから有効ランクを適応し,信号が存在する場所に集中させる。 LLaMAのバックボーン上でのインストラクションフォロー、理解、推論のベンチマーク全体にわたって、GRITはLoRAとQLoRAにマッチするか、あるいは超え、トレーニング可能なパラメータを平均で46%削減する(タスク全体で25～80%)。忘れをモデル化するために、曲率変調電力法則に適合する。実証的に、GRITは強力なPEFT最適化器ベースライン(Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo)よりも低いドリフトと更新vs保持フロンティアが得られる。

論文の概要: GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

関連論文リスト