Fugu-MT 論文翻訳(概要): Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

論文の概要: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

arxiv url: http://arxiv.org/abs/2402.00281v2
Date: Fri, 2 Feb 2024 02:56:43 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-05 11:49:17.659757
Title: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues
Title（参考訳）: 空間行動単位cuesによる表情認識の誘導
Authors: Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger
Abstract要約: 本研究では,空間行動単位(aus)を分類器のトレーニングに明示的に組み込んで,深い解釈可能なモデルを構築するための学習戦略を提案する。提案手法は,分類性能を劣化させることなく,階層的解釈性を向上させることができる。
参考スコア（独自算出の注目度）: 59.3149596834771
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While state-of-the-art facial expression recognition (FER) classifiers achieve a high level of accuracy, they lack interpretability, an important aspect for end-users. To recognize basic facial expressions, experts resort to a codebook associating a set of spatial action units to a facial expression. In this paper, we follow the same expert footsteps, and propose a learning strategy that allows us to explicitly incorporate spatial action units (aus) cues into the classifier's training to build a deep interpretable model. In particular, using this aus codebook, input image expression label, and facial landmarks, a single action units heatmap is built to indicate the most discriminative regions of interest in the image w.r.t the facial expression. We leverage this valuable spatial cue to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with \aus map. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with aus maps, simulating the experts' decision process. This is achieved using only the image class expression as supervision and without any extra manual annotations. Moreover, our method is generic. It can be applied to any CNN- or transformer-based deep classifier without the need for architectural change or adding significant training time. Our extensive evaluation on two public benchmarks RAFDB, and AFFECTNET datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on Class-Activation Mapping methods (CAMs), and we show that our training technique improves the CAM interpretability.
Abstract（参考訳）: 最先端の表情認識(FER)分類器は高い精度を達成するが、エンドユーザーにとって重要な側面である解釈性は欠如している。基本的表情を認識するために、専門家は一連の空間行動単位を表情に関連付けるコードブックを利用する。本稿では,同じ専門家の足跡を踏襲し,空間行動単位(aus)を分類器の訓練に明示的に組み込んで深い解釈可能なモデルを構築するための学習戦略を提案する。特に、このausコードブック、入力画像表現ラベル、および顔ランドマークを用いて、単一のアクションユニットヒートマップを構築し、顔表情画像w.r.tに対する最も識別的な領域を示す。我々はこの価値ある空間キューを利用して、FERの深い解釈可能な分類器を訓練する。これは、分類器の空間層の特徴を \aus マップと相関させることによって達成される。複合損失を用いて、分類器は、オースマップに関連付けられた解釈可能な視覚層毎の注意を与え、専門家の決定過程をシミュレートしながら、画像を正しく分類するように訓練される。これは、イメージクラス式のみを監督として、追加のマニュアルアノテーションなしで実現できる。さらに、このメソッドはジェネリックです。どんなCNNやトランスフォーマーベースのディープ分類器にも、アーキテクチャの変更やトレーニング時間の追加なしに適用することができる。 RAFDB と AFFECTNET データセットの2つの公開ベンチマークに対する広範な評価は、提案手法が分類性能を劣化させることなく階層的解釈性を向上させることができることを示している。さらに,クラスアクティベーションマッピング手法(CAM)に依存した共通タイプの解釈可能な分類器について検討し,学習手法がCAMの解釈可能性を向上させることを示す。

論文の概要: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

関連論文リスト