Fugu-MT 論文翻訳(概要): Frequency Attention for Knowledge Distillation

論文の概要: Frequency Attention for Knowledge Distillation

arxiv url: http://arxiv.org/abs/2403.05894v1
Date: Sat, 9 Mar 2024 12:18:48 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-13 11:51:29.634013
Title: Frequency Attention for Knowledge Distillation
Title（参考訳）: 知識蒸留における周波数注意
Authors: Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, and Thanh-Toan Do
Abstract要約: 本稿では,周波数領域における注意機構として機能する新しいモジュールを提案する。このモジュールは学習可能なグローバルフィルタで構成されており、教師の特徴の指導の下で生徒の特徴の周波数を調整することができる。次に,提案する周波数アテンションモジュールを利用して,知識レビューに基づく蒸留モデルを提案する。
参考スコア（独自算出の注目度）: 34.54224300153788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation is an attractive approach for learning compact deep neural networks, which learns a lightweight student model by distilling knowledge from a complex teacher model. Attention-based knowledge distillation is a specific form of intermediate feature-based knowledge distillation that uses attention mechanisms to encourage the student to better mimic the teacher. However, most of the previous attention-based distillation approaches perform attention in the spatial domain, which primarily affects local regions in the input image. This may not be sufficient when we need to capture the broader context or global information necessary for effective knowledge transfer. In frequency domain, since each frequency is determined from all pixels of the image in spatial domain, it can contain global information about the image. Inspired by the benefits of the frequency domain, we propose a novel module that functions as an attention mechanism in the frequency domain. The module consists of a learnable global filter that can adjust the frequencies of student's features under the guidance of the teacher's features, which encourages the student's features to have patterns similar to the teacher's features. We then propose an enhanced knowledge review-based distillation model by leveraging the proposed frequency attention module. The extensive experiments with various teacher and student architectures on image classification and object detection benchmark datasets show that the proposed approach outperforms other knowledge distillation methods.
Abstract（参考訳）: 知識蒸留は、複雑な教師モデルから知識を蒸留することで、軽量の学生モデルを学ぶ、コンパクトなディープニューラルネットワークを学ぶための魅力的なアプローチである。注意に基づく知識蒸留は、注意機構を用いて教師の模倣を奨励する中間的特徴に基づく知識蒸留の特定の形態である。しかし,従来の注意に基づく蒸留手法のほとんどは,主に入力画像の局所領域に影響を与える空間領域に注意を向けている。効果的な知識伝達に必要な広いコンテキストやグローバルな情報を捉える必要がある場合、これは不十分かもしれません。周波数領域では、各周波数は空間領域内の画像のすべての画素から決定されるため、画像に関する大域的な情報を含むことができる。周波数領域の利点に着想を得て,周波数領域の注意機構として機能する新しいモジュールを提案する。このモジュールは学習可能なグローバルフィルタで構成されており、教師の特徴の指導の下で生徒の特徴の周波数を調整することができる。そこで,提案する周波数アテンションモジュールを活用し,知識レビューに基づく蒸留モデルを提案する。画像分類とオブジェクト検出ベンチマークデータセットに関する様々な教員・学生アーキテクチャによる広範な実験により,提案手法が他の知識蒸留法よりも優れていることが示された。

関連論文リスト

FiGKD: Fine-Grained Knowledge Distillation via High-Frequency Detail Transfer [0.0]
Fine-Grained Knowledge Distillation (FiGKD) は、モデルのロジットを低周波(コンテンツ)と高周波(詳細)に分解する周波数認識フレームワークである。 FiGKDは、最先端のロジットベースおよび特徴ベースの蒸留法を様々な教師の学生構成で一貫して上回っている。
論文参考訳（メタデータ） (2025-05-17T08:27:02Z)
SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection [4.33169417430713]
正確な物体検出のための空間認識型適応的マスキング知識蒸留フレームワークを提案する。本手法は, 学生のネットワークを35.3%から38.8%に改善し, 最先端蒸留法より優れていた。
論文参考訳（メタデータ） (2025-01-13T07:26:37Z)
Quantifying Knowledge Distillation Using Partial Information Decomposition [14.82261635235695]
知識蒸留は、資源制約のある環境で複雑な機械学習モデルをデプロイするための効果的な方法を提供する。本研究では,教師の表現の蒸留可能かつ蒸留された知識を,ある学生と下流の課題に対応付けて定量化する。本手法は, 教員と学生の表現の複雑さの差に起因する課題に対処するために, 蒸留において実用的に応用できることを実証する。
論文参考訳（メタデータ） (2024-11-12T02:12:41Z)
LAKD-Activation Mapping Distillation Based on Local Learning [12.230042188890838]
本稿では,新しい知識蒸留フレームワークであるローカル注意知識蒸留(LAKD)を提案する。 LAKDは、教師ネットワークからの蒸留情報をより効率的に利用し、高い解釈性と競争性能を実現する。 CIFAR-10, CIFAR-100, ImageNetのデータセットについて実験を行い, LAKD法が既存手法より有意に優れていたことを示す。
論文参考訳（メタデータ） (2024-08-21T09:43:27Z)
Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation [29.821082433621868]
本研究では,物体検出のための注意型特徴蒸留(AFD)を提案する。本稿では,背景要素と前景要素を効果的に区別するマルチインスタンスアテンション機構を提案する。 AFDは、他の最先端モデルの性能を効率よく達成する。
論文参考訳（メタデータ） (2023-10-28T11:15:37Z)
Knowledge Diffusion for Distillation [53.908314960324915]
知識蒸留(KD)における教師と学生の表現ギャップこれらの手法の本質は、ノイズ情報を捨て、その特徴の貴重な情報を蒸留することである。 DiffKDと呼ばれる新しいKD手法を提案し、拡散モデルを用いて特徴を明示的に識別し一致させる。
論文参考訳（メタデータ） (2023-05-25T04:49:34Z)
Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
本研究では,事前学習した教師から対象学生へ,表現的知識を意味的に蒸留する新しいモデル名(bfem shortname)を提案する。問題レベルでは、これは知識蒸留とオープンセット半教師付き学習(SSL)との興味深い関係を確立する。我々のショートネームは、粗い物体分類と微妙な顔認識タスクの両方において、最先端の知識蒸留法よりもかなり優れている。
論文参考訳（メタデータ） (2022-05-13T15:15:27Z)
Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition [79.60708268515293]
本稿では,行動認識のための小型かつ効率的なネットワークの訓練方法について検討する。周波数領域における2つの蒸留戦略,すなわち特徴スペクトルとパラメータ分布蒸留を提案する。提案手法は,同じバックボーンを持つ最先端の手法よりも高い性能を実現することができる。
論文参考訳（メタデータ） (2020-09-15T07:29:57Z)
Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
我々は,任意の畳み込みニューラルネットワークにおける中間注意マップの弱教師付き生成機構を構築した。メタ批評家ネットワークを導入し、メインネットワークにおける注目マップの質を評価する。
論文参考訳（メタデータ） (2020-07-13T02:44:38Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
知識蒸留では、教師ネットワークから「暗黒の知識」を抽出し、学生ネットワークの学習を指導する。一見異なる自己超越的なタスクが、単純だが強力なソリューションとして機能することを示します。これらの自己超越信号の類似性を補助的タスクとして活用することにより、隠された情報を教師から生徒に効果的に転送することができる。
論文参考訳（メタデータ） (2020-06-12T12:18:52Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。