Fugu-MT 論文翻訳(概要): Attention, Please! Revisiting Attentive Probing for Masked Image Modeling

論文の概要: Attention, Please! Revisiting Attentive Probing for Masked Image Modeling

arxiv url: http://arxiv.org/abs/2506.10178v1
Date: Wed, 11 Jun 2025 21:10:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-13 15:37:22.456958
Title: Attention, Please! Revisiting Attentive Probing for Masked Image Modeling
Title（参考訳）: マスク画像モデリングにおける注意点探索の再検討
Authors: Bill Psomas, Dionysis Christopoulos, Eirini Baltzi, Ioannis Kakogeorgiou, Tilemachos Aravanis, Nikos Komodakis, Konstantinos Karantzalos, Yannis Avrithis, Giorgos Tolias,
Abstract要約: 我々は,冗長な投射を排除し,トレーニング可能なパラメータの数を減らし,従来のマルチヘッド注意よりも最大10$times$のスピードアップを実現する,効率的な探究機構(EP)を導入する。 EPはMIMをはるかに超えて様々な事前学習パラダイムを一般化し、解釈可能なアテンションマップを生成し、ローショットやレイヤーワイドの設定において強力なゲインを達成している。
参考スコア（独自算出の注目度）: 20.39513629593113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As fine-tuning (FT) becomes increasingly impractical at scale, probing is emerging as the preferred evaluation protocol for self-supervised learning (SSL). Yet, the standard linear probing (LP) fails to adequately reflect the potential of models trained with Masked Image Modeling (MIM), due to the distributed nature of patch tokens. This motivates the need for attentive probing, an alternative that uses attention to selectively aggregate patch-level features. Despite its growing adoption, attentive probing remains under-explored, with existing methods suffering from excessive parameterization and poor computational efficiency. In this work, we revisit attentive probing through the lens of the accuracy-efficiency trade-off. We conduct a systematic study of existing methods, analyzing their mechanisms and benchmarking their performance. We introduce efficient probing (EP), a multi-query cross-attention mechanism that eliminates redundant projections, reduces the number of trainable parameters, and achieves up to a 10$\times$ speed-up over conventional multi-head attention. Despite its simplicity, EP outperforms LP and prior attentive probing approaches across seven benchmarks, generalizes well beyond MIM to diverse pre-training paradigms, produces interpretable attention maps, and achieves strong gains in low-shot and layer-wise settings. Code available at https://github.com/billpsomas/efficient-probing.
Abstract（参考訳）: ファインチューニング(FT)の大規模化が進むにつれて、自己教師あり学習(SSL)の評価プロトコルとして探究が求められている。しかし、パッチトークンの分散特性のため、標準線形探索(LP)は、Masked Image Modeling(MIM)でトレーニングされたモデルの可能性を十分に反映することができない。これは注意力を使ってパッチレベルの機能を選択的に集約する代替手段である注意力調査の必要性を動機付けている。採用が進んでいるにもかかわらず、既存の手法は過剰なパラメータ化と計算効率の低下に悩まされている。本研究は,精度・効率トレードオフのレンズを通して,注意力の探索を再考するものである。既存の手法を体系的に研究し、そのメカニズムを分析し、性能をベンチマークする。我々は,冗長な投射を排除し,トレーニング可能なパラメータの数を削減し,従来のマルチヘッドアテンションよりも最大10$\times$スピードアップを達成するマルチクエリ・クロスアテンション機構である効率的なプロブリング(EP)を導入する。その単純さにもかかわらず、EPは7つのベンチマークにまたがってLPと事前注意探索アプローチを上回り、MIMをはるかに超えて様々な事前学習パラダイムに一般化し、解釈可能なアテンションマップを生成し、ローショットおよびレイヤーワイド設定において強力なゲインを達成している。コードはhttps://github.com/billpsomas/ efficient-probing.comで公開されている。

論文の概要: Attention, Please! Revisiting Attentive Probing for Masked Image Modeling

関連論文リスト