Fugu-MT 論文翻訳(概要): Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

論文の概要: Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

arxiv url: http://arxiv.org/abs/2108.06647v1
Date: Sun, 15 Aug 2021 02:21:01 GMT
ステータス: 翻訳完了
システム内更新日: 2021-08-17 15:12:22.091905
Title: Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning
Title（参考訳）: 双方向注意とコントラストメタラーニングによるFew-Shot Fine-Grained Action Recognition
Authors: Jiahao Wang, Yunhong Wang, Sheng Liu, Annan Li
Abstract要約: 現実世界のアプリケーションで特定のアクション理解の需要が高まっているため、きめ細かいアクション認識が注目を集めている。そこで本研究では,各クラスに付与されるサンプル数だけを用いて,新規なきめ細かい動作を認識することを目的とした,数発のきめ細かな動作認識問題を提案する。粒度の粗い動作では進展があったが、既存の数発の認識手法では、粒度の細かい動作を扱う2つの問題に遭遇する。
参考スコア（独自算出の注目度）: 51.03781020616402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions: the inability to capture subtle action details and the inadequacy in learning from data with low inter-class variance. To tackle the first issue, a human vision inspired bidirectional attention module (BAM) is proposed. Combining top-down task-driven signals with bottom-up salient stimuli, BAM captures subtle action details by accurately highlighting informative spatio-temporal regions. To address the second issue, we introduce contrastive meta-learning (CML). Compared with the widely adopted ProtoNet-based method, CML generates more discriminative video representations for low inter-class variance data, since it makes full use of potential contrastive pairs in each training episode. Furthermore, to fairly compare different models, we establish specific benchmark protocols on two large-scale fine-grained action recognition datasets. Extensive experiments show that our method consistently achieves state-of-the-art performance across evaluated tasks.
Abstract（参考訳）: 実世界のアプリケーションにおける特定の行動理解の需要が高まり、微粒な行動認識が注目を集めている一方、希少な微粒なカテゴリーのデータは非常に限られている。そこで本研究では,各クラスに付与されるサンプル数が少ない新規なきめ細かい動作を認識することを目的とした,数発のきめ細かい動作認識問題を提案する。粒度の粗いアクションでは進歩が見られたが、既存の極小ショット認識手法では、微妙なアクションの詳細をキャプチャできないことと、クラス間分散の少ないデータから学ぶことの不十分さという2つの問題に遭遇している。まず,人間の視覚に触発された双方向注意モジュール(bam)を提案する。 BAMは、トップダウンタスク駆動信号とボトムアップ唾液刺激を組み合わせることで、情報的時空間を正確に強調することにより微妙なアクションの詳細を捉える。第2の課題に対処するために、コントラストメタラーニング(CML)を導入する。広く採用されているProtoNetベースの手法と比較して、CMLはトレーニングエピソード毎に潜在的なコントラッシブペアをフル活用するため、低クラス間分散データに対してより差別的なビデオ表現を生成する。さらに、異なるモデルを比較するために、2つの大規模細粒度アクション認識データセットに特定のベンチマークプロトコルを確立する。実験結果から,提案手法は評価課題間の最先端性能を常に達成していることがわかった。

論文の概要: Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

関連論文リスト