Fugu-MT 論文翻訳(概要): TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

論文の概要: TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

arxiv url: http://arxiv.org/abs/2202.13393v4
Date: Thu, 5 Sep 2024 00:18:40 GMT
ステータス: 翻訳完了
システム内更新日: 2024-09-07 07:30:16.607752
Title: TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Title（参考訳）: TransKD: 効率的なセマンティックセグメンテーションのためのトランスフォーマー知識蒸留
Authors: Ruiping Liu, Kailun Yang, Alina Roitberg, Jiaming Zhang, Kunyu Peng, Huayao Liu, Yaonan Wang, Rainer Stiefelhagen,
Abstract要約: Transformer-based Knowledge Distillation (TransKD) フレームワークは,大規模教師トランスフォーマーの機能マップとパッチ埋め込みの両方を蒸留することにより,コンパクトな学生トランスフォーマーを学習する。 Cityscapes、ACDC、NYUv2、Pascal VOC2012データセットの実験は、TransKDが最先端の蒸留フレームワークより優れていることを示している。
参考スコア（独自算出の注目度）: 49.794142076551026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and aim to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental modules to realize feature map distillation and patch embedding distillation, respectively: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation. Furthermore, we introduce two optimization modules to enhance the patch embedding distillation from different perspectives: (1) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding; (2) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher's number of channels. Experiments on Cityscapes, ACDC, NYUv2, and Pascal VOC2012 datasets show that TransKD outperforms state-of-the-art distillation frameworks and rivals the time-consuming pre-training method. The source code is publicly available at https://github.com/RuipingL/TransKD.
Abstract（参考訳）: 自律運転の領域におけるセマンティックセグメンテーションのベンチマークは、大きな事前訓練されたトランスフォーマーによって支配されているが、その普及は、かなりの計算コストと長い訓練期間によって妨げられている。この制約を緩和するために、包括的知識蒸留の観点から効率的なセマンティックセグメンテーションを検討し、マルチソース知識抽出とトランスフォーマー固有のパッチ埋め込みのギャップを埋めることを目的としている。そこで我々は,Transformer-based Knowledge Distillation (TransKD) フレームワークを提案する。このフレームワークは,大規模教師トランスフォーマーの特徴マップとパッチ埋め込みを蒸留し,長期間の事前学習プロセスを回避し,FLOPを85.0%削減することで,コンパクトな学生トランスフォーマーを学習する。具体的には,(1)CSF(Cross Selective Fusion)は,チャネルアテンションと階層トランスフォーマー内の特徴マップ蒸留によるクロスステージ特徴間の知識伝達を可能にし,(2)Patch Embedding Alignment(PEA)はパッチ埋め込み蒸留を容易にするために,パッチ分割プロセス内で次元変換を行う。さらに,(1)グローバルローカルコンテキストミキサー(GL-Mixer)は,代表埋め込みのグローバル情報とローカル情報の両方を抽出し,(2)埋め込みアシスタント(EA)は,教師のチャネル数で教師と生徒のモデルをシームレスにブリッジする埋め込み方法として機能する。 Cityscapes、ACDC、NYUv2、Pascal VOC2012データセットの実験によると、TransKDは最先端の蒸留フレームワークより優れており、時間を要する事前学習手法に匹敵している。ソースコードはhttps://github.com/RuipingL/TransKDで公開されている。

論文の概要: TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

関連論文リスト