Fugu-MT 論文翻訳(概要): TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

論文の概要: TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

arxiv url: http://arxiv.org/abs/2202.13393v3
Date: Sun, 24 Dec 2023 07:59:29 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-28 02:20:04.670511
Title: TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Title（参考訳）: TransKD: 効率的なセマンティックセグメンテーションのためのトランスフォーマー知識蒸留
Authors: Ruiping Liu, Kailun Yang, Alina Roitberg, Jiaming Zhang, Kunyu Peng, Huayao Liu, Yaonan Wang, Rainer Stiefelhagen
Abstract要約: Transformer-based Knowledge Distillation (TransKD) フレームワークは,大規模教師トランスフォーマーの機能マップとパッチ埋め込みの両方を蒸留することにより,コンパクトな学生トランスフォーマーを学習する。 Cityscapes、ACDC、NYUv2、Pascal VOC2012データセットの実験によると、TransKDは最先端の蒸留フレームワークより優れており、時間を要する事前学習手法に匹敵している。
参考スコア（独自算出の注目度）: 51.93878604106518
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic segmentation benchmarks in the realm of autonomous driving are dominated by large pre-trained transformers, yet their widespread adoption is impeded by substantial computational costs and prolonged training durations. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and consider to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental and two optimization modules: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation; (3) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding; (4) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher's number of channels. Experiments on Cityscapes, ACDC, NYUv2, and Pascal VOC2012 datasets show that TransKD outperforms state-of-the-art distillation frameworks and rivals the time-consuming pre-training method. The source code is publicly available at https://github.com/RuipingL/TransKD.
Abstract（参考訳）: 自動運転の領域におけるセマンティックセグメンテーションベンチマークは、大きな事前訓練されたトランスフォーマーによって支配されているが、その普及は、かなりの計算コストと長い訓練期間によって妨げられている。この制約を緩和するために,包括的知識蒸留の観点から,効率的な意味セグメンテーションを考察し,多元的知識抽出とトランスフォーマ特有のパッチ埋め込みのギャップを埋めることを検討する。そこで我々は,Transformer-based Knowledge Distillation (TransKD) フレームワークを提案する。このフレームワークは,大規模教師トランスフォーマーの特徴マップとパッチ埋め込みを蒸留し,長期間の事前学習プロセスを回避し,FLOPを85.0%削減する。 Specifically, we propose two fundamental and two optimization modules: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation; (3) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding; (4) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher's number of channels. Cityscapes、ACDC、NYUv2、Pascal VOC2012データセットの実験によると、TransKDは最先端の蒸留フレームワークより優れており、時間を要する事前学習手法に匹敵している。ソースコードはhttps://github.com/RuipingL/TransKDで公開されている。

関連論文リスト

Data Efficient Any Transformer-to-Mamba Distillation via Attention Bridge [54.948715010753745]
状態空間モデル(SSM)はシーケンシャルモデリングのためのトランスフォーマーの効率的な代替品として登場し、再帰的な構造を通して優れたスケーラビリティを提供する。本研究では,トランスフォーマーの教師から状態空間の学生モデルへの注意知識の伝達を効率的に行う新しいデータ効率蒸留フレームワークであるCAB(Cross-architecture distillation via Attention Bridge)を提案する。本研究は,より強力なSSMコミュニティを構築するために,Transformerの専門知識の迅速な活用を可能にするために,注意に基づく知識を反復モデルに効率的に移行できることを示唆する。
論文参考訳（メタデータ） (2025-10-22T05:56:14Z)
CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs [2.7624021966289605]
本稿では,DETR検出器のためのCLoCKDistill(Consistent Location-and-Aware Knowledge Distillation)を提案する。我々は、価値あるグローバルコンテキストと長距離依存関係を含むトランスフォーマーエンコーダ出力(メモリ)を蒸留する。本手法は,学生検出器の性能を2.2%から6.4%向上させる。
論文参考訳（メタデータ） (2025-02-15T06:02:51Z)
BEExformer: A Fast Inferencing Binarized Transformer with Early Exits [2.7651063843287718]
BAT(Binarized Early Exit Transformer)とEE(Early Exit)を統合した最初の選択型学習ベーストランスであるBEExformerを紹介する。 BATは符号関数に微分可能な二階近似を用い、重みの符号と大きさの両方を捉える勾配を可能にする。 EEメカニズムは、ソフトルーティング損失推定を伴う中間変圧器ブロック間のエントロピーの分数還元に係わる。これにより、FLOPを52.08%削減して推論を加速し、深層ネットワークに固有の「過剰な」問題を解くことで精度を2.89%向上させる。
論文参考訳（メタデータ） (2024-12-06T17:58:14Z)
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers [22.1372572833618]
視覚変換器のための新規な数ショット特徴蒸留手法を提案する。まず、既存の視覚変換器の断続的な層から、より浅いアーキテクチャ(学生)へと重みをコピーする。次に、Low-Rank Adaptation (LoRA) の強化版を用いて、数ショットのシナリオで学生に知識を抽出する。
論文参考訳（メタデータ） (2024-04-14T18:57:38Z)
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers [1.894259749028573]
動作スポッティングのためのトランスフォーマーを初期化する新しいパイプラインであるCOMEDIANを提案する。この結果から,非事前学習モデルと比較して,性能の向上や収束の高速化など,事前学習パイプラインのメリットを浮き彫りにしている。
論文参考訳（メタデータ） (2023-09-03T20:50:53Z)
ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval [54.54667085792404]
両エンコーダのクロスアーキテクチャ蒸留を著しく向上させる新しい蒸留法を提案する。本手法は,バニラ二重エンコーダへの遅延相互作用(ColBERT)を効果的に蒸留できる自己オンザフライ蒸留法を導入し,また,クロスエンコーダの教師による性能向上のためにカスケード蒸留プロセスを導入している。
論文参考訳（メタデータ） (2022-05-18T18:05:13Z)
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers [91.6129538027725]
本稿では,変換器が必要とするデータ効率を向上させるために,DearKDと呼ばれる早期知識蒸留フレームワークを提案する。私たちのDearKDは、2段階のフレームワークで、まずCNNの初期中間層から誘導バイアスを蒸留し、その後、蒸留なしでトレーニングによってフルプレイする。
論文参考訳（メタデータ） (2022-04-27T15:11:04Z)
XAI for Transformers: Better Explanations through Conservative Propagation [60.67748036747221]
変換器の勾配は局所的にのみ関数を反映しており、入力特徴の予測への寄与を確実に識別できないことを示す。我々の提案は、よく確立されたLPP法のトランスフォーマーへの適切な拡張と見なすことができる。
論文参考訳（メタデータ） (2022-02-15T10:47:11Z)
Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition [124.80263629921498]
アーキテクチャ制約を同時に破りながら知識蒸留を入力レベルまで拡張するPixel Distillationを提案する。このようなスキームは、ネットワークアーキテクチャと画像品質の両方をリソースの全体的な要求に応じて調整できるため、展開のための柔軟なコスト制御を実現することができる。
論文参考訳（メタデータ） (2021-12-17T14:31:40Z)
Efficient Vision Transformers via Fine-Grained Manifold Distillation [96.50513363752836]
視覚変換器のアーキテクチャは多くのコンピュータビジョンタスクで異常な性能を示した。ネットワーク性能は向上するが、トランスフォーマーはより多くの計算資源を必要とすることが多い。本稿では,教師のトランスフォーマーから,画像と分割パッチの関係を通して有用な情報を抽出することを提案する。
論文参考訳（メタデータ） (2021-07-03T08:28:34Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。