Fugu-MT 論文翻訳(概要): Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

論文の概要: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

arxiv url: http://arxiv.org/abs/2402.00281v3
Date: Thu, 25 Apr 2024 16:55:46 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-26 21:08:18.263817
Title: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues
Title（参考訳）: 空間行動単位キューによる表情認識の誘導
Authors: Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger,
Abstract要約: オーキューを分類器学習に明示的に組み込むための新しい学習戦略が提案されている。分類性能を劣化させることなく階層的解釈性を向上させることができることを示す。
参考スコア（独自算出の注目度）: 55.97779732051921
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (\aus) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate \au cues into classifier training, allowing to train deep interpretable models. During training, this \au codebook is used, along with the input image expression label, and facial landmarks, to construct a \au heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with \au heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with \au maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks \rafdb, and \affectnet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.
Abstract（参考訳）: 表情認識のための最先端の分類器(FER)は高い精度を達成できるが、エンドユーザーにとって重要な特徴である解釈性に欠ける。専門家は通常、コードブックから表情の視覚的解釈のための顔領域への空間的行動単位(\aus)を関連付ける。本稿では、同様の専門家の手順を踏襲する。新しい学習戦略が提案され, \au cues を分類器訓練に明示的に組み込むことで, 深い解釈可能なモデルを訓練することができる。トレーニング中は、入力された画像表現ラベルと顔ランドマークとともに、このauコードブックを使用して、表情が興味のある最も識別性の高い画像領域を示す \auヒートマップを構築する。この価値ある空間キューを利用して、FERの深い解釈可能な分類器を訓練する。これは、分類器の空間層の特徴を \au ヒートマップと相関させることによって達成される。合成損失を用いて、分類器は、専門家決定過程をシミュレートし、 \au マップと相関した解釈可能な視覚層対応の注意を与えながら、画像を正しく分類するように訓練される。我々の戦略は、手作業のアノテーションを伴わずに、イメージクラスの表現のみを監督に頼っている。我々の新しい戦略は汎用的であり、アーキテクチャの変更や追加のトレーニング時間を必要とすることなく、ディープCNNやトランスフォーマーベースの分類器に適用できます。 2つの公開ベンチマークである \rafdb と \affectnet のデータセットを広範囲に評価した結果,提案手法は分類性能を劣化させることなく階層的解釈性を向上させることができることがわかった。さらに,クラスアクティベーションマッピング(CAM)手法に依存する共通タイプの解釈可能な分類器について検討し,そのアプローチがCAMの解釈可能性を向上させることを示す。

論文の概要: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

関連論文リスト