Fugu-MT 論文翻訳(概要): Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

論文の概要: Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

arxiv url: http://arxiv.org/abs/2606.01493v1
Date: Sun, 31 May 2026 23:19:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:29.734372
Title: Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
Title（参考訳）: Splatshot:1枚の制約のない写真から3D顔アバターを生成
Authors: Hao Liang, Zhixuan Ge, Soumendu Majee, Joanna Li, Ashok Veeraraghavan, Guha Balakrishnan,
Abstract要約: SplatShot(SplatShot)は,3D表現を記述プロセス内で直接結合する,トレーニング不要のフレームワークである。 SplatShotは3Dアバターを製作し、優れたアイデンティティ保存、フォトリアリズム、マルチビュー整合性を示す。
参考スコア（独自算出の注目度）: 22.10478800887373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reconstructing a photorealistic 3D face avatar from a single unconstrained photograph is challenging: feed-forward 3D Gaussian Splatting (3DGS) models degrade on out-of-distribution inputs, while pretrained diffusion models produce high-fidelity images but lack multi-view consistency. We observe that these paradigms are fundamentally complementary: explicit 3D representations guarantee geometric consistency, whereas 2D diffusion priors ensure photorealism. Building on this, we propose SplatShot, a training-free framework that couples these representations directly within the denoising process. Given a base 3DGS face model and a single reference image, we jointly denoise all target views using a per-step 3D feedback loop. At each timestep, we predict clean images from the noisy latents, refit the 3DGS to these multi-view predictions, and back-propagate the photometric discrepancy between the 3DGS re-renderings and 2D predictions into the noise estimate. This steers the sampling trajectory toward strictly 3D-coherent, identity-faithful outputs. Experiments on diverse in-the-wild images demonstrate that SplatShot produces 3D avatars with superior identity preservation, photorealism, and multi-view consistency.
Abstract（参考訳）: フィードフォワード3Dガウススプラッティング(3DGS)モデルは、分布外の入力で劣化し、事前訓練された拡散モデルは高忠実度画像を生成するが、マルチビューの整合性が欠如している。これらのパラダイムは基本的に相補的であり、明示的な3次元表現は幾何的整合性を保証する。これに基づいて,これらの表現を記述プロセス内で直接結合する,トレーニング不要のフレームワークであるSplatShotを提案する。ベースとなる3DGSの顔モデルと単一の参照画像が与えられた場合、ステップごとの3Dフィードバックループを用いて、全てのターゲットビューを共同で識別する。各段階において、ノイズの多い潜伏者からのクリーンな画像を予測し、3DGSをこれらのマルチビュー予測に適合させ、3DGSの再レンダリングと2D予測との間の光度差をノイズ推定にバックプロパガントする。これにより、厳密な3Dコヒーレントでアイデンティティに忠実な出力へのサンプリング軌道が導かれる。 SplatShotは3Dアバターを製作し、優れたアイデンティティ保存、フォトリアリズム、マルチビュー整合性を示す。

論文の概要: Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

関連論文リスト