Fugu-MT 論文翻訳(概要): Layer-wise LoRA fine-tuning: a similarity metric approach

論文の概要: Layer-wise LoRA fine-tuning: a similarity metric approach

arxiv url: http://arxiv.org/abs/2602.05988v1
Date: Thu, 05 Feb 2026 18:38:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.125168
Title: Layer-wise LoRA fine-tuning: a similarity metric approach
Title（参考訳）: 層状LoRAファインチューニング--類似度計量アプローチ
Authors: Keith Ando Ogawa, Bruno Lopes Yamamoto, Lucas Lauton de Alcantara, Lucas Pellicer, Rosimeire Pereira Costa, Edson Bollis, Anna Helena Reali Costa, Artur Jordao,
Abstract要約: Low-Rank Adaptation (LoRA) 技術は、事前学習されたモデルを凍結し、少数のパラメータを更新することで、このプロセスの計算コストを削減することを目的としている。従来の問題に対して,LoRAやその変種を用いて,少数の層のみを微調整に体系的に選択することで対処する。異なるモデルやタスク間で予測性能を維持しながら、LoRAベースのテクニックのトレーニング可能なパラメータを最大50%削減する。
参考スコア（独自算出の注目度）: 0.6323908398583081
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pre-training Large Language Models (LLMs) on web-scale datasets becomes fundamental for advancing general-purpose AI. In contrast, enhancing their predictive performance on downstream tasks typically involves adapting their knowledge through fine-tuning. Parameter-efficient fine-tuning techniques, such as Low-Rank Adaptation (LoRA), aim to reduce the computational cost of this process by freezing the pre-trained model and updating a smaller number of parameters. In comparison to full fine-tuning, these methods achieve over 99\% reduction in trainable parameter count, depending on the configuration. Unfortunately, such a reduction may prove insufficient as LLMs continue to grow in scale. In this work, we address the previous problem by systematically selecting only a few layers to fine-tune using LoRA or its variants. We argue that not all layers contribute equally to the model adaptation. Leveraging this, we identify the most relevant layers to fine-tune by measuring their contribution to changes in internal representations. Our method is orthogonal to and readily compatible with existing low-rank adaptation techniques. We reduce the trainable parameters in LoRA-based techniques by up to 50\%, while maintaining the predictive performance across different models and tasks. Specifically, on encoder-only architectures, this reduction in trainable parameters leads to a negligible predictive performance drop on the GLUE benchmark. On decoder-only architectures, we achieve a small drop or even improvements in the predictive performance on mathematical problem-solving capabilities and coding tasks. Finally, this effectiveness extends to multimodal models, for which we also observe competitive results relative to fine-tuning with LoRA modules in all layers. Code is available at: https://github.com/c2d-usp/Layer-wise-LoRA-with-CKA
Abstract（参考訳）: Webスケールデータセット上でのLLM(Large Language Models)の事前トレーニングは、汎用AIの推進に不可欠である。対照的に、下流タスクにおける予測性能の向上には、通常、微調整によって知識を適応させる必要がある。 Low-Rank Adaptation (LoRA)のようなパラメータ効率のよい微調整技術は、事前学習されたモデルを凍結し、少数のパラメータを更新することで、このプロセスの計算コストを削減することを目的としている。完全な微調整と比較して、これらの手法は構成に応じて訓練可能なパラメータ数を99\%以上削減する。残念ながら、LSMが大規模に成長し続けるにつれ、そのような削減は不十分である可能性がある。本研究では,LoRAやその変種を用いて,少数の層のみを微調整に体系的に選択することで,従来の問題に対処する。すべてのレイヤがモデル適応に等しく寄与するわけではない、と私たちは主張する。これを活用することで、内部表現の変化に対する貢献を測定することで、最も関連性の高いレイヤを微調整する。我々の手法は直交的であり、既存の低ランク適応技術と容易に互換性がある。異なるモデルやタスク間で予測性能を維持しながら、LoRAベースのテクニックのトレーニング可能なパラメータを最大50%削減する。具体的には、エンコーダのみのアーキテクチャでは、このトレーニング可能なパラメータの削減はGLUEベンチマークの無視可能な予測性能低下につながる。デコーダのみのアーキテクチャでは、数学的な問題解決能力やコーディングタスクにおける予測性能の低下や改善が達成される。最後に、この効果はマルチモーダルモデルにまで拡張され、全ての層におけるLoRAモジュールの微調整と比較して、競合する結果も観測できる。 https://github.com/c2d-usp/Layer-wise-LoRA-with-CKA

論文の概要: Layer-wise LoRA fine-tuning: a similarity metric approach

関連論文リスト