Fugu-MT 論文翻訳(概要): 3DRealHead: Few-Shot Detailed Head Avatar

論文の概要: 3DRealHead: Few-Shot Detailed Head Avatar

arxiv url: http://arxiv.org/abs/2604.13171v1
Date: Tue, 14 Apr 2026 18:00:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.236716
Title: 3DRealHead: Few-Shot Detailed Head Avatar
Title（参考訳）: 3DRealHead:頭部のアバターは微妙
Authors: Jalees Nehvi, Timo Bolkart, Thabo Beeler, Justus Thies,
Abstract要約: 本稿では,新しい表現制御信号を用いた頭部アバター再構成法である3DRealHeadを紹介する。被験者は自分の写真を数枚撮って、3Dヘッドアバターを回収し、消費者レベルのウェブカメラで運転することができる。アバターをアニメーションするために、U-Netは、駆動ビデオから抽出された口領域の特徴と同様に、3DMMベースの表情信号に条件付けされる。
参考スコア（独自算出の注目度）: 37.50886855423571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The human face is central to communication. For immersive applications, the digital presence of a person should mirror the physical reality, capturing the users idiosyncrasies and detailed facial expressions. However, current 3D head avatar methods often struggle to faithfully reproduce the identity and facial expressions, despite having multi-view data or learned priors. Learning priors that capture the diversity of human appearances, especially, for regions with highly person-specific features, like the mouth and teeth region is challenging as the underlying training data is limited. In addition, many of the avatar methods are purely relying on 3D morphable model-based expression control which strongly limits expressivity. To address these challenges, we are introducing 3DRealHead, a few-shot head avatar reconstruction method with a novel expression control signal that is extracted from a monocular video stream of the subject. Specifically, the subject can take a few pictures of themselves, recover a 3D head avatar and drive it with a consumer-level webcam. The avatar reconstruction is enabled via a novel few-shot inversion process of a 3D human head prior which is represented as a Style U-Net that emits 3D Gaussian primitives which can be rendered under novel views. The prior is learned on the NeRSemble dataset. For animating the avatar, the U-Net is conditioned on 3DMM-based facial expression signals, as well as features of the mouth region extracted from the driving video. These additional mouth features allow us to recover facial expressions that cannot be represented by the 3DMM leading to a higher expressivity and closer resemblance to the physical reality.
Abstract（参考訳）: 人間の顔はコミュニケーションの中心です。没入型アプリケーションでは、人物のデジタル存在は物理的な現実を反映し、ユーザの慣用性や詳細な表情を捉えなければならない。しかし、現在の3Dヘッドアバター法は、多視点データや学習前の学習にもかかわらず、アイデンティティと表情を忠実に再現するのに苦労することが多い。人間の外見の多様性を捉えた先行学習は、特に口や歯の領域のような非常に個人特有の特徴を持つ地域では、基礎となるトレーニングデータが限られているため、困難である。加えて、アバター法の多くは、表現性を強く制限する3次元形態素モデルに基づく表現制御に純粋に依存している。これらの課題に対処するため,被験者の単眼映像ストリームから抽出した新しい表現制御信号を用いた頭部アバター再構成法である3DRealHeadを導入する。具体的には、被験者は自分の写真を数枚撮って、3Dヘッドアバターを回収し、消費者レベルのウェブカメラで運転することができる。アバター再構成は、新しいビューでレンダリング可能な3Dガウスプリミティブを出力するスタイルU-Netとして表現される3D人間の頭部の新規な数ショット逆転処理によって実現される。前者はNeRSembleデータセットで学習される。アバターをアニメーションするために、U-Netは、駆動ビデオから抽出された口領域の特徴と同様に、3DMMベースの表情信号に条件付けされる。これらの追加の口の機能は、3DMMでは表現できない表情を復元することができ、より表現力が高く、身体的現実によく似ている。

論文の概要: 3DRealHead: Few-Shot Detailed Head Avatar

関連論文リスト