Fugu-MT 論文翻訳(概要): CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

論文の概要: CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

arxiv url: http://arxiv.org/abs/2406.13897v1
Date: Thu, 30 May 2024 05:57:36 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-23 13:15:04.270822
Title: CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
Title（参考訳）: CLAY:高品質な3Dアセット作成のための制御可能な大規模生成モデル
Authors: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu,
Abstract要約: CLAYは、人間の想像力を複雑な3Dデジタル構造に変換するために設計された3D幾何学および材料生成装置である。中心となるのは、多解像度変分オートエンコーダ(VAE)と最小遅延拡散変換器(DiT)からなる大規模生成モデルである。我々はCLAYを、スケッチ的な概念設計から複雑な詳細を持つ生産可能な資産まで、様々な制御可能な3Dアセット作成に活用することを実証する。
参考スコア（独自算出の注目度）: 43.315487682462845
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic text or image inputs as well as 3D-aware controls from diverse primitives (multi-view images, voxels, bounding boxes, point clouds, implicit representations, etc). At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), to extract rich 3D priors directly from a diverse range of 3D geometries. Specifically, it adopts neural fields to represent continuous and complete surfaces and uses a geometry generative module with pure transformer blocks in latent space. We present a progressive training scheme to train CLAY on an ultra large 3D model dataset obtained through a carefully designed processing pipeline, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY sets out to produce physically-based rendering (PBR) textures by employing a multi-view material diffusion model that can generate 2K resolution textures with diffuse, roughness, and metallic modalities. We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details. Even first time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.
Abstract（参考訳）: デジタルクリエイティビティの領域では、想像力から複雑な3D世界を創造する可能性はしばしば、既存のデジタルツールの限界によって妨げられています。この格差を狭めるために,人間の想像力を複雑な3Dデジタル構造に変換するために設計された3次元幾何学および材料生成装置であるCLAYを紹介する。 CLAYは、古典的なテキストやイメージ入力だけでなく、さまざまなプリミティブ(マルチビューイメージ、ボクセル、バウンディングボックス、ポイントクラウド、暗黙の表現など)からの3D対応コントロールもサポートする。中心となるのは、多解像度変分オートエンコーダ(VAE)と最小限の遅延拡散変換器(DiT)で構成される大規模な生成モデルで、多様な3次元測地からリッチな3D先行情報を直接抽出する。具体的には、連続かつ完備な曲面を表現するためにニューラルネットワークを採用し、潜在空間に純粋なトランスフォーマーブロックを持つ幾何生成モジュールを使用する。我々は、慎重に設計された処理パイプラインを通して得られた超大規模3次元モデルデータセットに基づいてCLAYを訓練するプログレッシブトレーニング手法を提案し、その結果、15億のパラメータを持つ3次元ネイティブジオメトリを生成する。外観生成のために、CLAYは2K解像度のテクスチャを拡散、粗さ、金属モードで生成できる多視点材料拡散モデルを用いて物理ベースレンダリング(PBR)テクスチャを作成する。我々はCLAYを、スケッチ的な概念設計から複雑な詳細を持つ生産可能な資産まで、様々な制御可能な3Dアセット作成に活用することを実証する。初めてでも、CLAYを使って鮮明な3D想像力を生かし、無限の創造性を生み出すことができる。

論文の概要: CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

関連論文リスト