Fugu-MT 論文翻訳(概要): Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

論文の概要: Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

arxiv url: http://arxiv.org/abs/2605.04412v2
Date: Thu, 07 May 2026 02:18:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 17:36:06.161818
Title: Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
Title（参考訳）: 構造を持つ3D潜伏剤は驚くほど強力:2D拡散で一般化可能なスタイルを解き放つ
Authors: Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma,
Abstract要約: 3Dアセット生成はゲームや仮想現実などの分野において重要な役割を担い、高忠実度3Dオブジェクトの迅速な合成を可能にする。既存のアプローチは、通常、3D生成モデルのトレーニング分布内にあるか、または類似しているスタイルのイメージに依存している。 textbfDiLAST: 2次元拡散に基づく3次元スタイル転送のための遅延覚醒について紹介する。
参考スコア（独自算出の注目度）: 16.012295855529935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce \textbf{DiLAST}: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representations for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.
Abstract（参考訳）: 3Dアセット生成はゲームや仮想現実などの分野において重要な役割を担い、単一の画像や複数の画像から高忠実度3Dオブジェクトを迅速に合成することができる。この能力に基づいて、スタイル制御可能な生成を可能にすることは、自然に重要かつ望ましい方向として現れる。しかし、既存のアプローチは一般的に3D生成モデルのトレーニング分布内にあるか、あるいは類似しているスタイルのイメージに依存している。アウト・オブ・ディストリビューション(OOD)スタイルで提示されると、パフォーマンスが大幅に低下するか、あるいは失敗する。この制限に対処するために,3次元スタイル転送のための2次元拡散に基づく遅延覚醒法である \textbf{DiLAST} を導入する。具体的には、教師として事前訓練された2次元拡散モデルを利用して、リッチで一般化可能なスタイルの事前情報を提供する。そこで本手法では,3次元ラテント表現をスタイリゼーションのために最適化する。この制限は、モデル容量の不足ではなく、本質的に表現力のある構造化された3D潜伏剤の未利用によるものである。比較的限られたデータで訓練されているにもかかわらず、3D生成モデルは2次元拡散誘導を利用して、潜在空間内の特定の方向を認知し、多様なOODスタイルを生成することができる。多様なデータと複数の3D生成バックボーンにわたる大規模な実験は、我々のアプローチの有効性とプラグアンドプレイの性質を実証している。

論文の概要: Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

関連論文リスト