Fugu-MT 論文翻訳(概要): AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

論文の概要: AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

arxiv url: http://arxiv.org/abs/2604.16067v1
Date: Fri, 17 Apr 2026 13:49:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.93836
Title: AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning
Title（参考訳）: AEGIS:知識保存のためのアンカー強化勾配分離
Authors: Guransh Singh,
Abstract要約: ロボット制御に事前訓練された視覚言語モデル(VLM)を適用するには、フローマッチングアクションエキスパートから高次連続勾配を、クロスエントロピーに特化して訓練されたバックボーンに注入する必要がある。このクロスモーダル勾配非対称性は、VLMの視覚的問合せ能力(VQA)の急激で激しい侵食を引き起こす。産業標準防衛は、勾配経路を完全に停止勾配で切断し、豊かな継続的監督を放棄するか、または、更新のランクを制約するローランクアダプタ(LoRA)を通してパラメータ容量を制限するか、したがって、訓練済み多様体を上書きする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regression gradients and the high-dimensional semantic manifold sculpted by CE pre-training, causes rapid, severe erosion of the VLM's visual-question-answering (VQA) capability. Industry-standard defences either sever the gradient pathway entirely via stop gradient, discarding the rich continuous supervision, or restrict parameter capacity through low-rank adapters (LoRA) that constrain the rank of updates but not their direction, and thus still overwrite the pre-trained manifold. We introduce AEGIS (Anchor-Enforced Gradient Isolation System): a buffer-free, layer-wise orthogonal gradient projection framework that enables direct continuous MSE learning while preserving the pre-trained VQA manifold - without any co-training data or replay buffer. AEGIS pre-computes a static Gaussian reference anchor from masked VQA forward passes across all transformer layers, then at each training step constructs a Wasserstein-2 transport penalty that generates an anchor restoration gradient. A sequential dual-backward decomposes the task and anchor gradients; for each transformer layer, AEGIS applies a single Gram-Schmidt orthogonal projection that bends the task gradient away from the destructive direction while preserving its constructive content. The projection sheds less than 1% of gradient energy on average, yet eliminates the cumulative activation drift that drives severe forgetting.
Abstract（参考訳）: ロボット制御に事前訓練された視覚言語モデル(VLM)を適用するには、フローマッチングアクションエキスパートから高次連続勾配を、クロスエントロピーに特化して訓練されたバックボーンに注入する必要がある。このクロスモーダル勾配非対称性は、低ランクのMSE回帰勾配とCE事前学習によって彫刻された高次元意味多様体とのスペクトル次元的ミスマッチであり、VLMの視覚-問合せ能力(VQA)の急激な侵食を引き起こす。業界標準の防衛は、勾配経路を完全に停止勾配から切り離すか、リッチな継続的監督を捨てるか、あるいは更新のランクを制約するローランクアダプタ (LoRA) を通じてパラメータ容量を制限するかのいずれかであり、それでも事前訓練された多様体を上書きする。 AEGIS (Anchor-Enforced Gradient isolation System: AEGIS) は、バッファフリーで階層的な直交勾配予測フレームワークで、事前学習されたVQA多様体を保存しながら直接連続的なMSE学習を可能にする。 AEGISは、マスクされたVQAフォワードから静的ガウス参照アンカーを前処理し、すべてのトランスフォーマー層を通過し、各トレーニングステップで、アンカー復元勾配を生成するWasserstein-2輸送ペナルティを構築する。シーケンシャルなデュアルバックワードはタスクとアンカー勾配を分解し、各トランスフォーマー層に対して、AEGISは単一のグラムシュミット直交射影を適用し、その構成的内容を保持しながらタスク勾配を破壊的な方向から遠ざける。投射は平均して勾配エネルギーの1%未満を消費するが、深い忘れ物を引き起こす累積活性化ドリフトを除去する。

論文の概要: AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

関連論文リスト