Fugu-MT 論文翻訳(概要): GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

論文の概要: GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

arxiv url: http://arxiv.org/abs/2604.07273v1
Date: Wed, 08 Apr 2026 16:34:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.639249
Title: GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos
Title（参考訳）: GenLCA: 3D Diffusion for Full-Body Avatars from in-the-Wild Videos (特集:光通信)
Authors: Yiqian Wu, Rawal Khirodkar, Egor Zakharov, Timur Bagautdinov, Lei Xiao, Zhaoen Su, Shunsuke Saito, Xiaogang Jin, Junxuan Li,
Abstract要約: テキストおよび画像入力から全体アバターを生成・編集するための拡散型生成モデルGenLCAを提案する。中心となるアイデアは、部分的に観測可能な2Dデータからフルボディの3D拡散モデルをトレーニングできる新しいパラダイムである。提案手法の有効性を,多種多様かつ高忠実な生成および編集結果を通じて実証し,既存のソリューションを大きなマージンで上回る結果を得た。
参考スコア（独自算出の注目度）: 41.35569686093567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present GenLCA, a diffusion-based generative model for generating and editing photorealistic full-body avatars from text and image inputs. The generated avatars are faithful to the inputs, while supporting high-fidelity facial and full-body animations. The core idea is a novel paradigm that enables training a full-body 3D diffusion model from partially observable 2D data, allowing the training dataset to scale to millions of real-world videos. This scalability contributes to the superior photorealism and generalizability of GenLCA. Specifically, we scale up the dataset by repurposing a pretrained feed-forward avatar reconstruction model as an animatable 3D tokenizer, which encodes unstructured video frames into structured 3D tokens. However, most real-world videos only provide partial observations of body parts, resulting in excessive blurring or transparency artifacts in the 3D tokens. To address this, we propose a novel visibility-aware diffusion training strategy that replaces invalid regions with learnable tokens and computes losses only over valid regions. We then train a flow-based diffusion model on the token dataset, inherently maintaining the photorealism and animatability provided by the pretrained avatar reconstruction model. Our approach effectively enables the use of large-scale real-world video data to train a diffusion model natively in 3D. We demonstrate the efficacy of our method through diverse and high-fidelity generation and editing results, outperforming existing solutions by a large margin. The project page is available at https://onethousandwu.com/GenLCA-Page.
Abstract（参考訳）: 我々は,テキストおよび画像入力からフォトリアリスティックフルボディアバターを生成・編集するための拡散型生成モデルGenLCAを提案する。生成されたアバターは入力に忠実であり、高忠実な顔とフルボディのアニメーションをサポートする。中心となるアイデアは、部分的に観測可能な2Dデータからフルボディの3D拡散モデルをトレーニングし、トレーニングデータセットを数百万の現実世界のビデオにスケール可能にする、という新しいパラダイムである。このスケーラビリティは、GenLCAの優れたフォトリアリズムと一般化可能性に寄与する。具体的には、トレーニング済みフィードフォワードアバター再構成モデルをアニマタブルな3Dトークンライザとして再利用し、非構造化ビデオフレームを構造化された3Dトークンにエンコードすることでデータセットをスケールアップする。しかし、現実世界のほとんどのビデオは身体部分の部分的な観察しか提供していないため、3Dトークンのぼやけや透明なアーティファクトが過剰になる。そこで本研究では,未知の領域を学習可能なトークンで置き換え,有効な領域に限って損失を算出する,可視性を考慮した新しい拡散学習手法を提案する。次に、トークンデータセット上にフローベース拡散モデルをトレーニングし、予め訓練されたアバター再構成モデルによって提供されるフォトリアリズムとアニマタビリティを本質的に維持する。提案手法は,大規模な実世界の映像データを用いて3次元の拡散モデルをネイティブに学習することを可能にする。提案手法の有効性を,多種多様かつ高忠実な生成および編集結果を通じて実証し,既存のソリューションを大きなマージンで上回る結果を得た。プロジェクトページはhttps://onethousandwu.com/GenLCA-Page.comで公開されている。

論文の概要: GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos

関連論文リスト