Fugu-MT 論文翻訳(概要): CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework

論文の概要: CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework

arxiv url: http://arxiv.org/abs/2508.04816v1
Date: Wed, 06 Aug 2025 18:55:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-08 18:59:39.620592
Title: CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework
Title（参考訳）: CoMAD: 複数の教師による自己監督型蒸留フレームワーク
Authors: Sriram Mandalika, Lalitha V,
Abstract要約: CoMAD (Consensus-oriented Masked Distillation) について紹介する。自己監督型ビジョントランスフォーマーからの知識を、コンパクトな学生ネットワークに統合する。 ImageNet-1Kでは、CoMADのViT-Tinyが75.4%のTop-1を達成した。
参考スコア（独自算出の注目度）: 1.2172320168050466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, learn powerful representations from unlabeled data but are typically pretrained in isolation, overlooking complementary insights and yielding large models that are impractical for resource-constrained deployment. To overcome these challenges, we introduce Consensus-oriented Masked Distillation (CoMAD), a lightweight, parameter-free framework that unifies knowledge from multiple current state-of-the-art self-supervised Vision Transformers into a compact student network. CoMAD distills from three pretrained ViT-Base teachers, MAE, MoCo v3, and iBOT, each offering distinct semantic and contextual priors. Rather than naively averaging teacher outputs, we apply asymmetric masking: the student sees only 25 percent of patches while each teacher receives a progressively lighter, unique mask, forcing the student to interpolate missing features under richer contexts. Teacher embeddings are aligned to the student's space via a linear adapter and layer normalization, then fused through our joint consensus gating, which weights each token by combining cosine affinity with inter-teacher agreement. The student is trained with dual-level KL divergence on visible tokens and reconstructed feature maps, capturing both local and global structure. On ImageNet-1K, CoMAD's ViT-Tiny achieves 75.4 percent Top-1, an increment of 0.4 percent over the previous state-of-the-art. In dense-prediction transfers, it attains 47.3 percent mIoU on ADE20K, and 44.5 percent box average precision and 40.5 percent mask average precision on MS-COCO, establishing a new state-of-the-art in compact SSL distillation.
Abstract（参考訳）: 対照的な学習やマスク付き画像モデリングといった多くの自己監督学習パラダイムは、ラベルのないデータから強力な表現を学習するが、通常は単独で事前訓練される。このような課題を克服するために,コンセンサス指向のMasked Distillation (CoMAD)を紹介した。 CoMADは、事前訓練された3人のViT-Base教師、MAE、MoCo v3、iBOTから抽出される。教師のアウトプットを平均的に評価する代わりに、非対称マスキングを適用する。学生はパッチの25%しか見ず、各教師は徐々に軽量でユニークなマスクを受け取り、よりリッチなコンテキスト下で欠落した特徴を補間することを余儀なくされる。教師の埋め込みは、リニアアダプターとレイヤーの正規化を通じて学生の空間に整列し、その上で、コサイン親和性と教師間の合意を組み合わせることで、各トークンを重み付けする共同コンセンサスゲーティングを通じて融合する。学生は、可視トークンと再構成された特徴マップの二重レベルKL分岐を訓練し、局所構造とグローバル構造の両方をキャプチャする。 ImageNet-1Kでは、CoMADのViT-Tinyが75.4%のTop-1を達成した。密度予測転送では、ADE20Kでは47.3% mIoU、MS-COCOでは44.5パーセントのボックス平均精度と40.5%のマスク平均精度を達成し、コンパクトなSSL蒸留において新たな最先端技術を確立した。

関連論文リスト

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation [7.478518822890964]
我々は,事前学習型視覚基礎モデル(VFM)をコンパクトな専門家に圧縮する,半教師付き知識蒸留(SSKD)フレームワークであるCASTを紹介する。 1) コントラスト画素校正による自己学習による VFM 教師のドメイン適応,(2) 統一多目的損失によるコンパクトな学生への蒸留,の3段階に展開する。 Cityscapes と ADE20K では、我々の11X小学生は、適応された VFM 教師を +3.4 AP (33.9 vs. 30.5) と +1.5 AP (16.7 vs. 15.2) で上回り、州を上回ります。
論文参考訳（メタデータ） (2025-05-28T02:45:42Z)
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
画像領域における視覚変換器(ViT)のコントラスト学習(CL)は、従来の畳み込みバックボーンのCLに匹敵する性能を達成した。 ViTで事前訓練した3Dポイントクラウドでは、マスク付きオートエンコーダ(MAE)モデリングが主流である。
論文参考訳（メタデータ） (2024-07-08T12:28:56Z)
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation [56.053397775016755]
本稿では,教師検出器の知識を学生に段階的に伝達する,知識蒸留への逐次的アプローチを提案する。私たちの知識を最大限に活用するために、私たちはTransformerベースの教師検出器から、畳み込みベースの学生まで、初めて知識を抽出しました。
論文参考訳（メタデータ） (2023-08-17T17:17:08Z)
Mixed Autoencoder for Self-supervised Visual Representation Learning [95.98114940999653]
Masked Autoencoder (MAE) は、画像パッチと再構成をランダムにマスキングすることで、様々な視覚タスクにおいて優れた性能を示す。本稿では,MAEのミキシング強化について検討する。
論文参考訳（メタデータ） (2023-03-30T05:19:43Z)
A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation [35.233203757760066]
学習可能な非線形チャネルワイズ変換を提案し,教師モデルと生徒の特徴を一致させる。本手法は,様々なコンピュータビジョンタスクにおいて,大幅な性能向上を実現する。
論文参考訳（メタデータ） (2023-03-23T12:13:29Z)
MOMA:Distill from Self-Supervised Teachers [6.737710830712818]
我々は,事前学習したMoCoとMAEを自己指導的に蒸留し,両者のパラダイムから知識を抽出するMOMAを提案する。実験では、MOMAは既存の最先端手法に匹敵する性能のコンパクトな学生モデルを提供している。
論文参考訳（メタデータ） (2023-02-04T04:23:52Z)
SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
本稿では,自己蒸留マスク付きオートエンコーダネットワークSdAEを提案する。 300エポックの事前トレーニングで、バニラViT-BaseモデルはImageNet-1k分類において84.1%の微調整精度を達成する。
論文参考訳（メタデータ） (2022-07-31T15:07:25Z)
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM)は、自己教師付き表現学習に対処する一般的な実践である。改良されたBERTスタイルの画像事前学習手法であるmc-BEiTを導入する。
論文参考訳（メタデータ） (2022-03-29T09:08:18Z)
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
そこで本研究では,すべてのピラミッドレベルにまたがる特徴ペア間のソフトマッチングを自動的に行う,意味誘導型特徴模倣手法を提案する。また,異なる特徴領域間の関係で符号化された情報を効果的に捉えるために,コントラスト蒸留を導入する。本手法は,(1)フレームワークのコンポーネントを別々に使用した場合に,既存の検出KD技術よりも優れた性能を発揮する。
論文参考訳（メタデータ） (2021-08-17T07:44:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。