Fugu-MT 論文翻訳(概要): Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

論文の概要: Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

arxiv url: http://arxiv.org/abs/2507.13260v1
Date: Thu, 17 Jul 2025 16:09:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-18 20:10:24.570587
Title: Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Title（参考訳）: 約直交微調整による事前学習型視覚変換器の適応性
Authors: Yiting Yang, Hao Luo, Yuan Sun, Qingsen Yan, Haokui Zhang, Wei Dong, Guoqing Wang, Peng Wang, Yang Yang, Hengtao Shen,
Abstract要約: 約直交微調整(AOFT)による低ランク重量行列の表現法を提案する。本手法は,下流画像分類タスクにおける競合性能を実現する。
参考スコア（独自算出の注目度）: 57.54306942529943
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A prevalent approach in Parameter-Efficient Fine-Tuning (PEFT) of pre-trained Vision Transformers (ViT) involves freezing the majority of the backbone parameters and solely learning low-rank adaptation weight matrices to accommodate downstream tasks. These low-rank matrices are commonly derived through the multiplication structure of down-projection and up-projection matrices, exemplified by methods such as LoRA and Adapter. In this work, we observe an approximate orthogonality among any two row or column vectors within any weight matrix of the backbone parameters; however, this property is absent in the vectors of the down/up-projection matrices. Approximate orthogonality implies a reduction in the upper bound of the model's generalization error, signifying that the model possesses enhanced generalization capability. If the fine-tuned down/up-projection matrices were to exhibit this same property as the pre-trained backbone matrices, could the generalization capability of fine-tuned ViTs be further augmented? To address this question, we propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices. This strategy employs a single learnable vector to generate a set of approximately orthogonal vectors, which form the down/up-projection matrices, thereby aligning the properties of these matrices with those of the backbone. Extensive experimental results demonstrate that our method achieves competitive performance across a range of downstream image classification tasks, confirming the efficacy of the enhanced generalization capability embedded in the down/up-projection matrices.
Abstract（参考訳）: 事前訓練されたビジョントランスフォーマー(ViT)のパラメータ効率の良いファインチューニング(PEFT)における一般的なアプローチは、バックボーンパラメータの大部分を凍結し、下流タスクに対応するために単にローランク適応重み行列を学習することである。これらの低ランク行列は、一般に、ローラやアダプターといった手法で例示される下降射影行列と上降射影行列の乗法構造によって導かれる。本研究では、バックボーンパラメータの重み行列内の任意の2行または列ベクトル間の近似直交性を観察するが、この性質はダウン/アップ・プロジェクション行列のベクトルには存在しない。近似直交性(英: Approximate orthogonality)は、モデルの一般化誤差の上界の減少を意味し、モデルが拡張一般化能力を持つことを示す。微調整ダウン/アッププロジェクション行列が、事前訓練したバックボーン行列と同じ性質を示す場合、微調整ViTの一般化能力をさらに高めることができるか? この問題に対処するため, 約直交微調整(AOFT)戦略を提案する。この戦略は1つの学習可能なベクトルを用いて、ほぼ直交ベクトルの集合を生成し、それがダウン/アップ・プロジェクション行列を形成し、これらの行列の性質とバックボーンの性質を整合させる。提案手法は,ダウン/アップ・プロジェクション行列に埋め込まれた拡張一般化機能の有効性を検証し,ダウンストリーム画像分類タスクにおける競合性能を実証した。

論文の概要: Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

関連論文リスト