Fugu-MT 論文翻訳(概要): Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

論文の概要: Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

arxiv url: http://arxiv.org/abs/2604.02320v1
Date: Thu, 02 Apr 2026 17:58:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.987646
Title: Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining
Title（参考訳）: 大規模コーデックアバター:大規模アバター事前訓練における不合理な効果
Authors: Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento, Yichen Xu, He Wen, Teng Deng, Wyatt Borsos, Anjali Thakrar, Jean-Charles Bazin, Carsten Stoll, Ginés Hidalgo, James Booth, Lucy Wang, Xiaowen Ma, Yu Rong, Sairanjith Thalanki, Chen Cao, Christian Häne, Abhishek Kar, Sofien Bouaziz, Jason Saragih, Yaser Sheikh, Shunsuke Saito,
Abstract要約: 大規模コーデックアバター(英: Large-scale Codec Avatars、LCA)は、世界規模の人口をフィードフォワード方式で一般化する高忠実でフルボディの3Dアバターモデルである。 LCAは、髪型、衣服、人口動態を一般化し、精密できめ細かい表情と指レベルの調音制御を提供する。
参考スコア（独自算出の注目度）: 62.501929209687056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-scale avatar models trained on millions of in-the-wild samples show promise for generalization across a wide range of identities, yet the resulting avatars are often of low-quality due to inherent 3D ambiguities. To address this, we present Large-Scale Codec Avatars (LCA), a high-fidelity, full-body 3D avatar model that generalizes to world-scale populations in a feedforward manner, enabling efficient inference. Inspired by the success of large language models and vision foundation models, we present, for the first time, a pre/post-training paradigm for 3D avatar modeling at scale: we pretrain on 1M in-the-wild videos to learn broad priors over appearance and geometry, then post-train on high-quality curated data to enhance expressivity and fidelity. LCA generalizes across hair styles, clothing, and demographics while providing precise, fine-grained facial expressions and finger-level articulation control, with strong identity preservation. Notably, we observe emergent generalization to relightability and loose garment support to unconstrained inputs, and zero-shot robustness to stylized imagery, despite the absence of direct supervision.
Abstract（参考訳）: 高品質な3Dアバターモデリングは、忠実性と一般化の間の重要なトレードオフに直面している。一方、マルチビュースタジオデータは、表現やポーズを正確に制御した人間の高忠実度モデリングを可能にするが、限られた規模とスタジオ環境と現実世界とのドメインギャップのため、現実のデータへの一般化に苦慮している。一方、近年の大規模アバターモデルでは、数百万の現場サンプルで訓練された場合、広範囲のアイデンティティにわたる一般化が期待できるが、結果として得られるアバターは、固有の3D曖昧さのため、しばしば低品質である。そこで我々は,高忠実でフルボディの3DアバターモデルであるLarge-Scale Codec Avatars (LCA)を提案する。大規模言語モデルと視覚基盤モデルの成功に触発されて,我々は初めて,3次元アバターモデリングのための事前訓練パラダイムを提示する。 LCAは、髪型、衣服、人口動態を一般化し、正確できめ細かな表情と指レベルの調音制御を提供し、強いアイデンティティを保っている。特に, 直接監督がないにもかかわらず, 即時的な一般化, 照らしやすさ, ゆるやかな衣服サポート, スタイル化画像に対するゼロショットロバスト性を観察した。

論文の概要: Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

関連論文リスト