Fugu-MT 論文翻訳(概要): EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

論文の概要: EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

arxiv url: http://arxiv.org/abs/2603.07604v1
Date: Sun, 08 Mar 2026 12:21:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.904567
Title: EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation
Title（参考訳）: EmbedTalk: 埋め込み駆動ガウス変形を用いたトリプレーンフリートーキングヘッド合成
Authors: Arpita Saggar, Jonathan C. Darling, Duygu Sarikaya, David C. Hogg,
Abstract要約: リアルタイム音声ヘッド合成は、変形可能な3次元ガウススプラッティング(3DGS)に依存している。近年の研究では、4次元シーン再構成における時間的変形を駆動する学習型埋め込みの優位性を示している。 EmbedTalkは、レンダリング品質、唇の同期、動きの一貫性において、既存の3DGSベースの手法よりも優れていることを示す。
参考スコア（独自算出の注目度）: 5.207307163958805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-time talking head synthesis increasingly relies on deformable 3D Gaussian Splatting (3DGS) due to its low latency. Tri-planes are the standard choice for encoding Gaussians prior to deformation, since they provide a continuous domain with explicit spatial relationships. However, tri-plane representations are limited by grid resolution and approximation errors introduced by projecting 3D volumetric fields onto 2D subspaces. Recent work has shown the superiority of learnt embeddings for driving temporal deformations in 4D scene reconstruction. We introduce $\textbf{EmbedTalk}$, which shows how such embeddings can be leveraged for modelling speech deformations in talking head synthesis. Through comprehensive experiments, we show that EmbedTalk outperforms existing 3DGS-based methods in rendering quality, lip synchronisation, and motion consistency, while remaining competitive with state-of-the-art generative models. Moreover, replacing the tri-plane encoding with learnt embeddings enables significantly more compact models that achieve over 60 FPS on a mobile GPU (RTX 2060 6 GB). Our code will be placed in the public domain on acceptance.
Abstract（参考訳）: リアルタイム音声ヘッド合成は、低レイテンシのため、変形可能な3Dガウススプラッティング(3DGS)に依存している。三面体は変形に先立ってガウスを符号化する標準的な選択肢である。しかし、3次元体積場を2次元部分空間に投影することによって生じる格子分解と近似誤差によって三面表現が制限される。近年の研究では、4次元シーン再構成における時間的変形を駆動する学習型埋め込みの優位性を示している。音声頭部合成における音声変形のモデル化において,このような埋め込みをいかに活用できるかを示す。総合的な実験を通して,EmbedTalkは,最先端の生成モデルと競合しながら,品質,唇の同期,動きの一貫性をレンダリングする既存の3DGS法よりも優れていることを示す。さらに、三面体エンコーディングを学習した埋め込みに置き換えることで、モバイルGPU(RTX 2060 6 GB)上で60FPSを超える、はるかにコンパクトなモデルを実現することができる。私たちのコードは受理後パブリックドメインに置かれます。

論文の概要: EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation

関連論文リスト