Fugu-MT 論文翻訳(概要): Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

論文の概要: Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

arxiv url: http://arxiv.org/abs/2605.07302v1
Date: Fri, 08 May 2026 06:12:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.846361
Title: Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation
Title（参考訳）: Pretrainingは、下流タスク適応のための再利用可能なスペクトル基底を誘導する
Authors: Junjie Yu, Yue Wang, Zihan Deng, Yan Zhu, Wenxiao Ma, Quanying Liu,
Abstract要約: 微調整事前訓練されたモデルは、全パラメータ空間の低次元部分空間で発生する。ダウンストリームタスクとは無関係な安定方向か、それとも、追加調整を必要としないタスク関連構造をすでにエンコードしているか? 事前学習した重み行列の先頭特異ベクトルは、微調整の下で非常に安定であり、無関係な下流タスク間で共有されることを示す。
参考スコア（独自算出の注目度）: 10.547646302449682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge transfers. Through systematic spectral analysis across vision and language models, we show that the leading singular vectors of pretrained weight matrices remain highly stable under finetuning and are shared across unrelated downstream tasks, revealing that pretraining establishes a reusable spectral coordinate system. Models pretrained on larger datasets exhibit greater spectral stability under distribution shift or task change, directly linking pretraining scale to geometric transferability. Motivated by these findings, we propose a parameter-efficient method that freezes pretrained singular vectors and optimizes only leading spectral coefficients, achieving competitive performance on GLUE with 0.2% trainable parameters. Our results reveal that the stable directions encode transferable structure rather than irrelevant noise: successful pretraining discovers spectral bases that downstream tasks inherit and operate within.
Abstract（参考訳）: 微調整事前訓練されたモデルは、全パラメータ空間の低次元部分空間で発生する。以前の研究は、この最適化部分空間を特徴づけることに重点を置いていたが、相補的な質問をほとんど無視していた。これらの安定方向は下流のタスクとは無関係なのか、それとも、それ以上の調整を必要としないタスク関連構造をすでにエンコードしているのだろうか? この疑問に答えることは、事前訓練された知識の伝達の理解の中心である。視覚および言語モデル間の系統的なスペクトル分析により、事前学習した重み行列の先頭特異ベクトルは微調整下で高度に安定であり、無関係な下流タスク間で共有されることが示され、事前学習が再利用可能なスペクトル座標系を確立することが明らかとなった。より大きなデータセットで事前訓練されたモデルでは、分布シフトやタスク変更の下でスペクトル安定性が向上し、事前訓練スケールと幾何学的転送可能性を直接リンクする。これらの結果から,事前学習した特異ベクトルを凍結し,先行するスペクトル係数のみを最適化し,0.2%のトレーニング可能なパラメータでGLUE上での競合性能を実現するパラメータ効率向上手法を提案する。本結果より, 定常方向は非関連ノイズではなく伝達可能な構造を符号化していることが明らかとなった。

論文の概要: Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

関連論文リスト