Fugu-MT 論文翻訳(概要): A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

論文の概要: A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

arxiv url: http://arxiv.org/abs/2303.13212v2
Date: Fri, 24 Mar 2023 02:40:47 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-27 11:12:55.717225
Title: A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation
Title（参考訳）: チャネルワイズ変換による特徴蒸留のためのシンプルで汎用的なフレームワーク
Authors: Ziwei Liu, Yongtao Wang, Xiaojie Chu
Abstract要約: 学習可能な非線形チャネルワイズ変換を提案し,教師モデルと生徒の特徴を一致させる。本手法は,様々なコンピュータビジョンタスクにおいて,大幅な性能向上を実現する。
参考スコア（独自算出の注目度）: 35.233203757760066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.
Abstract（参考訳）: 知識蒸留は、大きな教師モデルから小さな学生モデルに模倣して知識を伝達する一般的な手法である。しかし,教師と生徒間で特徴マップを直接調整することで,生徒に過度に厳格な制約を課すことができるため,学生モデルの性能は低下する。上記の特徴の不一致問題を軽減するため,既存の研究は教師と生徒の特徴マップをピクセルワイドな変換で空間的に整列させることに重点を置いている。本稿では,教師と生徒の特徴マップをチャネル次元に沿って整列させることが,特徴的不一致問題への対処に有効であることを新たに発見する。具体的には,教師モデルと教師モデルの特徴を整合させるために,学習可能な非線形チャネル回り変換を提案する。そこで,我々はさらに,蒸留損失とタスク固有損失のバランスをとるためのハイパーパラメータを1つだけ備えた,シンプルで汎用的な機能蒸留フレームワークを提案する。 Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. コードは公開される予定だ。

関連論文リスト

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation [2.7624021966289605]
ACAM-KDは蒸留プロセス全体を通して学生の要求に適応する。これにより、最先端技術よりも1.4mAPまでのオブジェクト検出性能が向上する。 Cityscapesのセマンティックセグメンテーションでは、ベースライン上でmIoUを3.09アップする。
論文参考訳（メタデータ） (2025-03-08T18:51:53Z)
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers [15.446480934024652]
本稿では, 簡便かつ効果的な知識蒸留法であるScaleKDを提案する。本手法は,画像分類データセット上で,さまざまな畳み込みニューラルネットワーク(CNN),多層パーセプトロン(MLP),ViTアーキテクチャにまたがる学生のバックボーンを訓練することができる。教師モデルやその事前学習データセットのサイズをスケールアップする際,提案手法は所望のスケーラブルな特性を示す。
論文参考訳（メタデータ） (2024-11-11T08:25:21Z)
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation [56.053397775016755]
本稿では,教師検出器の知識を学生に段階的に伝達する,知識蒸留への逐次的アプローチを提案する。私たちの知識を最大限に活用するために、私たちはTransformerベースの教師検出器から、畳み込みベースの学生まで、初めて知識を抽出しました。
論文参考訳（メタデータ） (2023-08-17T17:17:08Z)
Improving Knowledge Distillation via Regularizing Feature Norm and Direction [16.98806338782858]
知識蒸留(KD)は、大きな訓練されたモデル(例えば教師)を利用して、同じタスクのために同じデータセット上で小さな学生モデルを訓練する。教師の特徴を知識として扱うこと、知識蒸留訓練の学生は、その特徴を教師の特徴と整合させることによって、例えば、ロジット間のKL偏差を最小化し、中間特徴間のL2距離を最小化する。教師に対する生徒の特徴の整合性の向上は教師の知識をよりよく蒸留すると考えるのは自然なことだが、単にこの整合性を強制することは生徒のパフォーマンスに直接寄与しない。
論文参考訳（メタデータ） (2023-05-26T15:05:19Z)
NORM: Knowledge Distillation via N-to-One Representation Matching [18.973254404242507]
本稿では,2つの線形層からなる簡易な特徴変換 (FT) モジュールに依存する2段階の知識蒸留法を提案する。教師ネットワークが学習した無傷情報を保存するため、我々のFTモジュールは学生ネットワークの最後の畳み込み層にのみ挿入される。拡張された生徒表現を、教師と同じ数の特徴チャネルを持つN個の非重複特徴セグメントに順次分割することにより、教師表現を同時に近似させる。
論文参考訳（メタデータ） (2023-05-23T08:15:45Z)
A Light-weight Deep Learning Model for Remote Sensing Image Classification [70.66164876551674]
リモートセンシング画像分類(RSIC)のための高性能で軽量なディープラーニングモデルを提案する。 NWPU-RESISC45ベンチマークで広範な実験を行うことで、提案した教師学生モデルは最先端システムより優れている。
論文参考訳（メタデータ） (2023-02-25T09:02:01Z)
AMD: Adaptive Masked Distillation for Object [8.668808292258706]
本研究では,物体検出のための空間チャネル適応型マスク蒸留(AMD)ネットワークを提案する。学生のネットワークチャネルを適応させるために、シンプルで効率的なモジュールを使用します。提案手法により, 学生ネットワークは41.3%, 42.4%, 42.7%mAPスコアを報告した。
論文参考訳（メタデータ） (2023-01-31T10:32:13Z)
Masked Generative Distillation [23.52519832438352]
Masked Generative Distillation (MGD) は一般的な特徴に基づく蒸留法である。本稿では,教師が生徒の特徴回復を指導することで,生徒の表現力を向上できることを示す。
論文参考訳（メタデータ） (2022-05-03T14:30:26Z)
Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
本稿では,検出システム内の情報を利用して,検出知識の蒸留を容易にするための簡単な知識構造を提案する。我々は,1段と2段の両方の検出器上で,多様な学生-教師ペアによるCOCOオブジェクト検出の課題に対して,新しい最先端の成果を達成した。
論文参考訳（メタデータ） (2021-09-27T08:26:00Z)
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning [94.89221799550593]
SSL(Self-supervised representation Learning)はコミュニティから広く注目を集めている。最近の研究では、モデルサイズが小さくなれば、その性能は低下すると主張している。単純かつ効果的な蒸留コントラスト学習(DisCo)を提案し、問題を大きなマージンで緩和します。
論文参考訳（メタデータ） (2021-04-19T08:22:52Z)
Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
現在の最先端のオブジェクト検出器は高い計算コストを犠牲にしており、ローエンドデバイスへのデプロイが困難である。より大規模な教師モデルから知識を伝達することで、より小さな学生ネットワークを訓練することを目的とした知識蒸留は、モデル小型化のための有望な解決策の1つである。
論文参考訳（メタデータ） (2020-06-23T15:58:22Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
このアーキテクチャは、異なるネットワークブランチにチャンネルワイズを応用し、機能間相互作用のキャプチャと多様な表現の学習の成功を活用する。我々のモデルはResNeStと呼ばれ、画像分類の精度と遅延トレードオフにおいてEfficientNetより優れています。
論文参考訳（メタデータ） (2020-04-19T20:40:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。