Fugu-MT 論文翻訳(概要): LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning

論文の概要: LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning

arxiv url: http://arxiv.org/abs/2508.06202v1
Date: Fri, 08 Aug 2025 10:32:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-11 20:39:06.197172
Title: LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning
Title（参考訳）: LoRAのLoRA: 連続的な視覚的インストラクションチューニングのためのパラメータ効率の良いアーキテクチャ拡張を目指して
Authors: Chang Che, Ziqi Wang, Pengwan Yang, Qi Wang, Hui Ma, Zenglin Shi,
Abstract要約: MLLMにおけるCVITに適した,高効率なアーキテクチャ拡張手法LiLoRAを紹介する。 LiLoRAはタスク間でLoRA行列Aを共有して冗長性を低減し、タスク固有のパラメータを最小化するために行列Bに追加の低ランク分解を適用し、コサイン規則化された安定性損失を組み込んで時間の経過とともに一貫性を維持する。実験の結果,LiLoRAは逐次的タスク学習において一貫した性能を実現し,既存の手法に比べてパラメータ効率を著しく向上することがわかった。
参考スコア（独自算出の注目度）: 12.165720711684758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual Visual Instruction Tuning (CVIT) enables Multimodal Large Language Models (MLLMs) to incrementally learn new tasks over time. However, this process is challenged by catastrophic forgetting, where performance on previously learned tasks deteriorates as the model adapts to new ones. A common approach to mitigate forgetting is architecture expansion, which introduces task-specific modules to prevent interference. Yet, existing methods often expand entire layers for each task, leading to significant parameter overhead and poor scalability. To overcome these issues, we introduce LoRA in LoRA (LiLoRA), a highly efficient architecture expansion method tailored for CVIT in MLLMs. LiLoRA shares the LoRA matrix A across tasks to reduce redundancy, applies an additional low-rank decomposition to matrix B to minimize task-specific parameters, and incorporates a cosine-regularized stability loss to preserve consistency in shared representations over time. Extensive experiments on a diverse CVIT benchmark show that LiLoRA consistently achieves superior performance in sequential task learning while significantly improving parameter efficiency compared to existing approaches.
Abstract（参考訳）: CVIT(Continuous Visual Instruction Tuning)は、MLLM(Multimal Large Language Models)が時間とともに新たなタスクを段階的に学習することを可能にする。しかし、このプロセスは、モデルが新しいタスクに適応するにつれて、前もって学習したタスクのパフォーマンスが劣化する破滅的な忘れ込みによって挑戦される。忘れを緩和するための一般的なアプローチはアーキテクチャ拡張であり、干渉を防ぐためにタスク固有のモジュールを導入する。しかし、既存のメソッドは各タスクのレイヤ全体を拡張し、大きなパラメータのオーバーヘッドとスケーラビリティの低下につながります。これらの課題を克服するため,MLLMのCVITに適したアーキテクチャ拡張手法であるLoRA(LiLoRA)にLoRAを導入する。 LiLoRAはタスク間でLoRA行列Aを共有して冗長性を低減し、タスク固有のパラメータを最小化するために行列Bに追加の低ランク分解を適用し、コサイン規則化された安定性損失を組み込んで、共有表現における一貫性を時間とともに維持する。多様なCVITベンチマークによる大規模な実験により、LiLoRAは逐次タスク学習において常に優れた性能を達成し、既存の手法に比べてパラメータ効率は大幅に向上した。

論文の概要: LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning

関連論文リスト