Fugu-MT 論文翻訳(概要): Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

論文の概要: Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

arxiv url: http://arxiv.org/abs/2604.01756v1
Date: Thu, 02 Apr 2026 08:24:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.610385
Title: Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction
Title（参考訳）: 人-ロボットインタラクションのための3次元動的ビセムと協調モデルに基づくリアルな唇運動生成
Authors: Sheng Li, Jingcheng Huang, Min Li,
Abstract要約: 本稿では,3次元動的ビセムと協調モデルに基づく唇運動生成フレームワークを提案する。提案アーキテクチャの有効性と精度を実験的に検証し,実証した。本研究は,ヒューマノイドロボットの音声駆動リップモーション生成において,軽量で効率的かつ実用的なパラダイムを提供する。
参考スコア（独自算出の注目度）: 11.131577042400844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and accuracy of the proposed architecture is experimentally validated and demonstrated with quantitative ablation experiments using the metrics of the Pearson Correlation Coefficient (PCC) and the Mean Absolute Jerk (MAJ). This research offers a lightweight, efficient, and highly practical paradigm for the speech-driven lip motion generation of humanoid robots. The 3D dynamic viseme library and real-world deployment videos are available at {https://github.com/yuesheng21/Phoneme-to-Lip-14DOF}
Abstract（参考訳）: リアルな唇同期は、ヒト型ロボットの自然な人間-ロボット非言語的相互作用に不可欠である。そこで本研究では,3次元動的ビセムとコーアティキュレーションモデリングに基づく唇運動生成フレームワークを提案する。中国語発音理論を解析することにより、3次元動的ビセムライブラリがARKit標準に基づいて構築され、唇のコヒーレントな先行軌跡を提供する。連続音声ストリーム内の動作競合を解決するために,初期最終(Shengmu-Yunmu)デカップリングとエネルギー変調を組み込むことにより,協調機構を開発する。ヒューマノイドヘッドプラットフォームにおける高次元空間唇運動を14-DOF口唇運動系に再ターゲットする戦略を開発した後、Pearson correlation Coefficient (PCC) とMean Absolute Jerk (MAJ) の測定値を用いて、提案アーキテクチャの有効性と精度を実験的に検証し、定量的アブレーション実験により実証した。本研究は,ヒューマノイドロボットの音声駆動リップモーション生成において,軽量で効率的かつ実用的なパラダイムを提供する。 3D動的ビセムライブラリと実世界の展開ビデオは、https://github.com/yuesheng21/Phoneme-to-Lip-14DOF}で公開されている。

論文の概要: Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

関連論文リスト