Fugu-MT 論文翻訳(概要): Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

論文の概要: Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

arxiv url: http://arxiv.org/abs/2604.02397v1
Date: Thu, 02 Apr 2026 13:38:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.142213
Title: Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition
Title（参考訳）: プライバシ・バイ・ファンクショナル・デザイン(グループ)感情認識のための変分エンコーダ-マルチ・デコーダ(VE-MD)
Authors: Anderson Augusma, Dominique Vaufreydaz, Fédérique Letué,
Abstract要約: 本研究では,プライバシを意識した機能設計に基づくグループ感情認識のための変分型マルチデコーダフレームワークVE-MDを提案する。 VE-MDは、正式な匿名化や暗号化プライバシ保証を提供するのではなく、明示的な個人監視を避けるように設計されている。 VE-MDは、感情分類と身体と顔の構造的表現の内部予測に最適化された共有潜在表現を学習する。
参考スコア（独自算出の注目度）: 0.764671395172401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Group Emotion Recognition (GER) aims to infer collective affect in social environments such as classrooms, crowds, and public events. Many existing approaches rely on explicit individual-level processing, including cropped faces, person tracking, or per-person feature extraction, which makes the analysis pipeline person-centric and raises privacy concerns in deployment scenarios where only group-level understanding is needed. This research proposes VE-MD, a Variational Encoder-Multi-Decoder framework for group emotion recognition under a privacy-aware functional design. Rather than providing formal anonymization or cryptographic privacy guarantees, VE-MD is designed to avoid explicit individual monitoring by constraining the model to predict only aggregate group-level affect, without identity recognition or per-person emotion outputs. VE-MD learns a shared latent representation jointly optimized for emotion classification and internal prediction of body and facial structural representations. Two structural decoding strategies are investigated: a transformer-based PersonQuery decoder and a dense Heatmap decoder that naturally accommodates variable group sizes. Experiments on six in-the-wild datasets, including two GER and four Individual Emotion Recognition (IER) benchmarks, show that structural supervision consistently improves representation learning. More importantly, the results reveal a clear distinction between GER and IER: optimizing the latent space alone is often insufficient for GER because it tends to attenuate interaction-related cues, whereas preserving explicit structural outputs improves collective affect inference. In contrast, projected structural representations seem to act as an effective denoising bottleneck for IER. VE-MD achieves state-of-the-art performance on GAF-3.0 (up to 90.06%) and VGAF (82.25% with multimodal fusion with audio). These results show that preserving interaction-related structural information is particularly beneficial for group-level affect modeling without relying on prior individual feature extraction. On IER datasets using multimodal fusion with audio modality, VE-MD outperforms SOTA on SamSemo (77.9%, adding text modality) while achieving competitive performances on MER-MULTI (63.8%), DFEW (70.7%) and EngageNet (69.0).
Abstract（参考訳）: グループ感情認識(GER)は、教室、群衆、公共イベントなどの社会環境における集団的影響を推測することを目的としている。多くの既存アプローチでは、分析パイプラインを個人中心にし、グループレベルの理解が必要なデプロイメントシナリオでプライバシ上の懸念を提起する、トリミングされた顔、人物追跡、個人毎の機能抽出など、明示的な個別レベルの処理に依存している。本研究では,プライバシーに配慮した機能設計の下で,グループ感情認識のための変分エンコーダ・マルチ・デコーダ・フレームワークであるVE-MDを提案する。 VE-MDは、正式な匿名化や暗号化プライバシ保証を提供する代わりに、個人毎の感情出力やアイデンティティ認識なしに、グループレベルの影響のみを予測するようにモデルを拘束することで、明示的な個人監視を避けるように設計されている。 VE-MDは、感情分類と身体と顔の構造的表現の内部予測に最適化された共有潜在表現を学習する。変換器をベースとしたPersonQueryデコーダと,可変グループサイズを自然に許容するHeatmapデコーダの2つの構造的デコーダについて検討した。 2つのGERと4つの個人感情認識(IER)ベンチマークを含む6つのアプリ内データセットの実験は、構造的監督が表現学習を一貫して改善していることを示している。さらに重要なことは、GERとIERの明確な区別が明らかである: 遅延空間のみの最適化は、相互作用に関連するキューを減衰させる傾向があるため、GERにとってしばしば不十分であり、一方、明示的な構造的アウトプットの保存は、集団的影響推論を改善する。対照的に、投影された構造表現は、IERの効果的な分極ボトルネックとして機能しているように見える。 VE-MDはGAF-3.0(最大90.06%)とVGAF(オーディオとのマルチモーダル融合による82.25%)で最先端のパフォーマンスを実現している。これらの結果から, 相互作用関連構造情報の保存は, 先行した特徴抽出に頼ることなく, グループレベルの影響モデリングに特に有用であることが示唆された。マルチモーダル融合とオーディオモダリティを用いたIERデータセットでは、VE-MDはSamSemo(77.9%、テキストモダリティの追加)でSOTAを上回り、MER-MULTI(63.8%)、DFEW(70.7%)、EngageNet(69.0)で競合性能を達成した。

論文の概要: Variational Encoder--Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

関連論文リスト