Fugu-MT 論文翻訳(概要): ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

論文の概要: ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

arxiv url: http://arxiv.org/abs/2604.12255v1
Date: Tue, 14 Apr 2026 04:05:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.227279
Title: ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception
Title（参考訳）: ARGen:視覚に基づく動的感情知覚への効果強化ジェネレーション強化
Authors: Huanzhen Wang, Ziheng Zhou, Jiaqi Song, Li He, Yunshi Lan, Yan Wang, Wenqiang Zhang,
Abstract要約: 本稿では,データ適応型動的表現生成による感情認識の堅牢化を実現するARGenを提案する。 ARGenは、Affective Semantic Injection(ASI)とAdaptive Reinforcement Diffusion(ARD)の2段階で動作する。
参考スコア（独自算出の注目度）: 38.35698479436818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic facial expression recognition in the wild remains challenging due to data scarcity and long-tail distributions, which hinder models from effectively learning the temporal dynamics of scarce emotions. To address these limitations, we propose ARGen, an Affect-Reinforced Generative Augmentation Framework that enables data-adaptive dynamic expression generation for robust emotion perception. ARGen operates in two stages: Affective Semantic Injection (ASI) and Adaptive Reinforcement Diffusion (ARD). The ASI stage establishes affective knowledge alignment through facial Action Units and employs a retrieval-augmented prompt generation strategy to synthesize consistent and fine-grained affective descriptions via large-scale visual-language models, thereby injecting interpretable emotional priors into the generation process. The ARD stage integrates text-conditioned image-to-video diffusion with reinforcement learning, introducing inter-frame conditional guidance and a multi-objective reward function to jointly optimize expression naturalness, facial integrity, and generative efficiency. Extensive experiments on both generation and recognition tasks verify that ARGen substantially enhances synthesis fidelity and improves recognition performance, establishing an interpretable and generalizable generative augmentation paradigm for vision-based affective computing.
Abstract（参考訳）: 野生における動的表情認識は、データ不足と長い尾の分布のために依然として困難であり、弱い感情の時間的ダイナミクスをモデルが効果的に学習することを妨げている。これらの制約に対処するために、ロバストな感情知覚のためのデータ適応動的表現生成を可能にするAffect-Reinforced Generative Augmentation FrameworkであるARGenを提案する。 ARGenは、Affective Semantic Injection (ASI)とAdaptive Reinforcement Diffusion (ARD)の2つの段階で動作する。 ASIステージは、顔アクションユニットを通じて情緒的知識アライメントを確立し、大規模な視覚言語モデルを介して、一貫性のあるきめ細かい情緒的記述を合成し、解釈可能な情緒的先行を生成プロセスに注入する。 ARDステージは、テキスト条件付き画像間拡散と強化学習を統合し、フレーム間条件ガイダンスと多目的報酬機能を導入し、表情の自然性、顔の完全性、生成効率を共同で最適化する。生成タスクと認識タスクの両方に関する広範な実験により、ARGenは合成フィデリティを大幅に向上し、認識性能を向上させることが確認され、視覚ベースの感情コンピューティングのための解釈可能で一般化可能な生成拡張パラダイムが確立される。

論文の概要: ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

関連論文リスト