Fugu-MT 論文翻訳(概要): Capturing Head Avatar with Hand Contacts from a Monocular Video

論文の概要: Capturing Head Avatar with Hand Contacts from a Monocular Video

arxiv url: http://arxiv.org/abs/2510.17181v1
Date: Mon, 20 Oct 2025 05:55:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:39.321295
Title: Capturing Head Avatar with Hand Contacts from a Monocular Video
Title（参考訳）: モノクロ映像からの接触による頭部アバターの撮影
Authors: Haonan He, Yufeng Zheng, Jie Song,
Abstract要約: 写真3Dヘッドアバターはテレプレゼンス、ゲーム、VRに欠かせない。本稿では,手と顔の相互作用によって引き起こされる頭部の細かなアバターと非剛性変形を共同で学習する新しい枠組みを提案する。
参考スコア（独自算出の注目度）: 11.762269003891165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Photorealistic 3D head avatars are vital for telepresence, gaming, and VR. However, most methods focus solely on facial regions, ignoring natural hand-face interactions, such as a hand resting on the chin or fingers gently touching the cheek, which convey cognitive states like pondering. In this work, we present a novel framework that jointly learns detailed head avatars and the non-rigid deformations induced by hand-face interactions. There are two principal challenges in this task. First, naively tracking hand and face separately fails to capture their relative poses. To overcome this, we propose to combine depth order loss with contact regularization during pose tracking, ensuring correct spatial relationships between the face and hand. Second, no publicly available priors exist for hand-induced deformations, making them non-trivial to learn from monocular videos. To address this, we learn a PCA basis specific to hand-induced facial deformations from a face-hand interaction dataset. This reduces the problem to estimating a compact set of PCA parameters rather than a full spatial deformation field. Furthermore, inspired by physics-based simulation, we incorporate a contact loss that provides additional supervision, significantly reducing interpenetration artifacts and enhancing the physical plausibility of the results. We evaluate our approach on RGB(D) videos captured by an iPhone. Additionally, to better evaluate the reconstructed geometry, we construct a synthetic dataset of avatars with various types of hand interactions. We show that our method can capture better appearance and more accurate deforming geometry of the face than SOTA surface reconstruction methods.
Abstract（参考訳）: 光リアルな3Dヘッドアバターは、テレプレゼンス、ゲーム、VRに欠かせない。しかし、ほとんどの方法は顔の領域のみに焦点をあて、あごに手を当てたり、頬に優しく触れる指など、自然な手と顔の相互作用を無視し、熟考のような認知状態を伝達する。本研究では,手と顔の相互作用によって引き起こされる頭部の細かなアバターと非剛性変形を共同で学習する新しい枠組みを提案する。この課題には2つの主要な課題がある。第一に、手と顔を別々に追跡することは、相対的なポーズを捉えるのに失敗する。そこで本研究では,顔と手の空間的関係を正確に保ちながら,ポーズトラッキングにおける奥行きの喪失と接触正則化を両立させることを提案する。第2に、手動による変形について、一般に利用可能な先例は存在しないため、モノクラービデオから学ぶことは簡単ではない。そこで我々は,手動による顔の変形に特有のPCAベースを,対面インタラクションデータセットから学習する。これにより、完全な空間変形場ではなく、コンパクトなPCAパラメータのセットを推定する問題を減らすことができる。さらに, 物理シミュレーションに触発された接触損失は, さらなる監視, 相互接続アーティファクトの大幅な低減, 結果の物理的妥当性の向上に寄与する。我々は,iPhoneで撮影したRGB(D)ビデオに対するアプローチを評価する。また, 再構成された形状をよりよく評価するために, アバターの合成データセットを構築した。本手法は,SOTA表面再構成法よりも顔の外観や形状の精度がよいことを示す。

論文の概要: Capturing Head Avatar with Hand Contacts from a Monocular Video

関連論文リスト