Fugu-MT 論文翻訳(概要): LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation

論文の概要: LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation

arxiv url: http://arxiv.org/abs/2510.21864v1
Date: Thu, 23 Oct 2025 10:09:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:14.62533
Title: LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation
Title（参考訳）: LSFアニメーション:意図的特徴表現によるラベルなし音声駆動型顔アニメーション
Authors: Xin Lu, Chuanqing Zhuang, Chenxi Jin, Zhengda Lu, Yiqun Wang, Wu Liu, Jun Xiao,
Abstract要約: 明示的な感情やアイデンティティの特徴表現への依存を解消する新しいフレームワークであるLSF-Animationを提案する。具体的には、LSFアニメーションは、音声から感情情報を暗黙的に抽出し、中立的な顔メッシュから識別特徴をキャプチャする。本手法は,感情表現性,アイデンティティの一般化,アニメーションリアリズムの観点から,近年の最先端アプローチを超越した手法である。
参考スコア（独自算出の注目度）: 37.790140423936776
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Speech-driven 3D facial animation has attracted increasing interest since its potential to generate expressive and temporally synchronized digital humans. While recent works have begun to explore emotion-aware animation, they still depend on explicit one-hot encodings to represent identity and emotion with given emotion and identity labels, which limits their ability to generalize to unseen speakers. Moreover, the emotional cues inherently present in speech are often neglected, limiting the naturalness and adaptability of generated animations. In this work, we propose LSF-Animation, a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization to unseen speakers and emotional states without requiring manual labels. Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism. The source code will be released at: https://github.com/Dogter521/LSF-Animation.
Abstract（参考訳）: 音声駆動の3D顔アニメーションは、表現的かつ時間的に同期されたデジタル人間を生成する可能性から関心を集めている。最近の作品では感情を意識したアニメーションの探索が始まっているが、それでも特定の感情とアイデンティティラベルでアイデンティティと感情を表現するために、明示的なワンホットのエンコーディングに依存しているため、認識できない話者に一般化する能力は制限されている。さらに、音声に固有の感情的手がかりは無視されることが多く、生成したアニメーションの自然性と適応性が制限される。本研究では,明示的な感情やアイデンティティの特徴表現への依存を解消する新しいフレームワークであるLSF-Animationを提案する。具体的には、LSF-Animationは、音声から感情情報を暗黙的に抽出し、中立的な顔メッシュから識別特徴をキャプチャし、手動ラベルを必要とせずに、目に見えない話者や感情状態への一般化を改善する。さらに,2つのトランスフォーマー機能を統合し,感情的,動作的,同一性に関連する手がかりを効果的に統合するために,融合トークンを用いた階層的相互作用融合ブロック(HIFB)を導入する。 3DMEADデータセットを用いた大規模な実験により,感情表現性,アイデンティティの一般化,アニメーションリアリズムの観点から,近年の最先端のアプローチを超越した結果が得られた。ソースコードは、https://github.com/Dogter521/LSF-Animation.comでリリースされる。

論文の概要: LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation

関連論文リスト