Fugu-MT 論文翻訳(概要): Face Anything: 4D Face Reconstruction from Any Image Sequence

論文の概要: Face Anything: 4D Face Reconstruction from Any Image Sequence

arxiv url: http://arxiv.org/abs/2604.19702v1
Date: Tue, 21 Apr 2026 17:22:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.9016
Title: Face Anything: 4D Face Reconstruction from Any Image Sequence
Title（参考訳）: 顔画像から4D顔を復元する「Face Anything」
Authors: Umut Kocasari, Simon Giebenhain, Richard Shaw, Matthias Nießner,
Abstract要約: そこで本研究では,正準顔点予測に基づく高忠実度4次元顔再構成の統一手法を提案する。深度と標準座標を共同で予測することにより,正確な深度推定,時間的に安定な再構築,密度の高い3次元形状,頑健な顔点追跡が可能となる。
参考スコア（独自算出の注目度）: 49.395407357499074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate reconstruction and tracking of dynamic human faces from image sequences is challenging because non-rigid deformations, expression changes, and viewpoint variations occur simultaneously, creating significant ambiguity in geometry and correspondence estimation. We present a unified method for high-fidelity 4D facial reconstruction based on canonical facial point prediction, a representation that assigns each pixel a normalized facial coordinate in a shared canonical space. This formulation transforms dense tracking and dynamic reconstruction into a canonical reconstruction problem, enabling temporally consistent geometry and reliable correspondences within a single feed-forward model. By jointly predicting depth and canonical coordinates, our method enables accurate depth estimation, temporally stable reconstruction, dense 3D geometry, and robust facial point tracking within a single architecture. We implement this formulation using a transformer-based model that jointly predicts depth and canonical facial coordinates, trained using multi-view geometry data that non-rigidly warps into the canonical space. Extensive experiments on image and video benchmarks demonstrate state-of-the-art performance across reconstruction and tracking tasks, achieving approximately 3$\times$ lower correspondence error and faster inference than prior dynamic reconstruction methods, while improving depth accuracy by 16%. These results highlight canonical facial point prediction as an effective foundation for unified feed-forward 4D facial reconstruction.
Abstract（参考訳）: 非剛性な変形、表現の変化、視点の変化が同時に起こるため、画像列からの動的人間の顔の正確な復元と追跡は困難であり、幾何学的および対応的推定において顕著な曖昧さを生み出している。本研究では,各画素の正規化顔座標を共有正準空間に割り当てる表現である,正準顔点予測に基づく高忠実度4D顔再構成の統一手法を提案する。この定式化は、濃密な追跡と動的再構成を標準的再構成問題に変換し、単一のフィードフォワードモデル内で時間的に一貫した幾何と信頼性のある対応を可能にする。本手法は, 深度と標準座標の同時予測により, 正確な深度推定, 時間的に安定な再構成, 密度の高い3次元形状, 単一のアーキテクチャ内での頑健な顔点追跡を可能にする。この定式化は,非剛性で正準空間に反する多視点幾何データを用いて,深度と正準顔座標を共同で予測する変圧器モデルを用いて実施する。画像とビデオのベンチマークによる大規模な実験は、復元タスクと追跡タスクをまたいだ最先端のパフォーマンスを示し、約3$\times$低対応誤差を達成し、従来の動的再構成手法よりも高速な推論を実現し、深さ精度を16%向上させた。これらの結果は、フィードフォワード4Dの顔再構成に有効な基礎として、標準的顔点予測を強調した。

論文の概要: Face Anything: 4D Face Reconstruction from Any Image Sequence

関連論文リスト