Fugu-MT 論文翻訳(概要): Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

論文の概要: Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

arxiv url: http://arxiv.org/abs/2605.18993v2
Date: Fri, 22 May 2026 10:00:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 14:44:53.691582
Title: Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic
Title（参考訳）: 効果的なタスク算術のための非線形ファインチューニングにおける線形化挙動の蒸留
Authors: Thomas Sommariva, Francesca Morandi, Simone Calderara, Angelo Porrello,
Abstract要約: 線形と標準非線形微調整のギャップを埋める。曲率規則化された線形化教師の隠れ表現を,従来の微調整で訓練した非線形の学生に蒸留する。その結果, 線形化モデルの重要な特性をタスク演算に継承し, タスクベクトルの効率的な構成を実現し, 推論時間オーバーヘッドを発生させることなく, 視覚および言語ベンチマーク間での強い性能を実現することができた。
参考スコア（独自算出の注目度）: 17.222346684974607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear fine-tuning) has proven effective, as it produces task vectors that are naturally disentangled and resistant to interference. However, linearized models suffer from limited expressivity during training and incur higher computational costs at inference time, which restrict their practical applicability. In this work, we bridge the gap between linear and standard non-linear fine-tuning. We show that linearity with respect to weight perturbations, a property defined in parameter space, can be enforced through constraints in activation space during training. Concretely, we distill hidden representations from a curvature-regularized linearized teacher into a non-linear student trained via conventional fine-tuning. We find that the resulting model inherits key properties of linearized models for task arithmetic, enabling effective composition of task vectors and achieving strong performance across vision and language benchmarks without incurring any inference-time overhead.
Abstract（参考訳）: タスクベクトル合成は、事前訓練されたモデルを編集するための有望なパラダイムとして登場し、追加とサブトラクションによる非学習によるモデルマージを可能にした。事前訓練されたモデル(線形微調整)の接空間での微調整は、自然に非絡み合い、干渉に抵抗するタスクベクトルを生成するため、有効であることが証明されている。しかし、線形化モデルは、訓練中に限られた表現性に悩まされ、推論時に高い計算コストを発生させ、実用性を制限する。本研究では、線形と標準非線形微調整のギャップを埋める。パラメータ空間で定義された特性である重み摂動に対する線形性は、トレーニング中に活性化空間の制約によって強制できることを示す。具体的には、曲率規則化された線形化教師の隠れ表現を、従来の微調整で訓練された非線形の学生に蒸留する。その結果, 線形化モデルの重要な特性をタスク演算に継承し, タスクベクトルの効率的な構成を実現し, 推論時間オーバーヘッドを発生させることなく, 視覚および言語ベンチマーク間での強い性能を実現することができた。

論文の概要: Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

関連論文リスト