Fugu-MT 論文翻訳(概要): Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

論文の概要: Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

arxiv url: http://arxiv.org/abs/2604.09088v1
Date: Fri, 10 Apr 2026 08:16:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.769036
Title: Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation
Title（参考訳）: マスク付きデュアルパス蒸留によるフェイディングサイドネットワークによるメモリ効率向上学習
Authors: Yutong Zhang, Jiaxin Chen, Honglin Chen, Kaiqi Zheng, Shengcai Liao, Hanwen Zhong, Weixin Li, Yunhong Wang,
Abstract要約: そこで我々は,Masked Dual Path Distillation (MDPD) と呼ばれる新しい手法を提案する。 MDPDはパラメータとメモリ消費を同等に保ちながら、推論を少なくとも25.2%高速化することを示す。提案手法はSOTA手法と比較して精度を著しく向上させる。
参考スコア（独自算出の注目度）: 41.8703974624689
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memory-efficient transfer learning (METL) approaches have recently achieved promising performance in adapting pre-trained models to downstream tasks. They avoid applying gradient backpropagation in large backbones, thus significantly reducing the number of trainable parameters and high memory consumption during fine-tuning. However, since they typically employ a lightweight and learnable side network, these methods inevitably introduce additional memory and time overhead during inference, which contradicts the ultimate goal of efficient transfer learning. To address the above issue, we propose a novel approach dubbed Masked Dual Path Distillation (MDPD) to accelerate inference while retaining parameter and memory efficiency in fine-tuning with fading side networks. Specifically, MDPD develops a framework that enhances the performance by mutually distilling the frozen backbones and learnable side networks in fine-tuning, and discard the side network during inference without sacrificing accuracy. Moreover, we design a novel feature-based knowledge distillation method for the encoder structure with multiple layers. Extensive experiments on distinct backbones across vision/language-only and vision-and-language tasks demonstrate that our method not only accelerates inference by at least 25.2\% while keeping parameter and memory consumption comparable, but also remarkably promotes the accuracy compared to SOTA approaches. The source code is available at https://github.com/Zhang-VKk/MDPD.
Abstract（参考訳）: メモリ効率変換学習(METL)アプローチは、最近、ダウンストリームタスクに事前学習されたモデルを適用することで、有望な性能を達成した。大きなバックボーンに勾配のバックプロパゲーションを適用することを避けるため、微調整時にトレーニング可能なパラメータの数と高いメモリ消費を大幅に削減できる。しかし、通常は軽量で学習可能なサイドネットワークを使用するため、これらの手法は推論中にメモリと時間のオーバーヘッドを必然的に導入する。この課題に対処するために,フェードサイドネットワークを用いた微調整において,パラメータとメモリ効率を保ちつつ,推論を高速化するMasked Dual Path Distillation (MDPD) という新しい手法を提案する。具体的には、MDPDは、冷凍したバックボーンと学習可能なサイドネットワークを微調整で相互に蒸留し、精度を犠牲にすることなく、推論中にサイドネットワークを破棄することで性能を向上させるフレームワークを開発する。さらに,複数の層を有するエンコーダ構造のための特徴量に基づく新しい知識蒸留法を設計する。視覚・言語・言語タスクの異なるバックボーンに対する広範な実験により、我々の手法は、パラメータとメモリ消費を同等に保ちながら、少なくとも25.2\%の推論を加速するだけでなく、SOTAアプローチと比較して精度を著しく向上させることを示した。ソースコードはhttps://github.com/Zhang-VKk/MDPDで入手できる。

論文の概要: Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

関連論文リスト