Fugu-MT 論文翻訳(概要): Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

論文の概要: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

arxiv url: http://arxiv.org/abs/2402.12225v2
Date: Tue, 26 Mar 2024 15:06:00 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-27 21:53:51.130403
Title: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability
Title（参考訳）: キャパシティとスケーラビリティを考慮した3次元形状生成のための自己回帰モデル
Authors: Xuelin Qian, Yu Wang, Simian Luo, Yinda Zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Xiangyang Xue, Bo Zhao, Tiejun Huang, Yunsheng Wu, Yanwei Fu,
Abstract要約: 自己回帰モデルでは,格子空間における関節分布をモデル化することにより,2次元画像生成において顕著な結果が得られた。自動回帰モデルを3次元領域に拡張し,キャパシティとスケーラビリティを同時に向上することにより,3次元形状生成の強力な能力を求める。
参考スコア（独自算出の注目度）: 118.26563926533517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously. Firstly, we leverage an ensemble of publicly available 3D datasets to facilitate the training of large-scale models. It consists of a comprehensive collection of approximately 900,000 objects, with multiple properties of meshes, points, voxels, rendered images, and text captions. This diverse labeled dataset, termed Objaverse-Mix, empowers our model to learn from a wide range of object variations. However, directly applying 3D auto-regression encounters critical challenges of high computational demands on volumetric grids and ambiguous auto-regressive order along grid dimensions, resulting in inferior quality of 3D shapes. To this end, we then present a novel framework Argus3D in terms of capacity. Concretely, our approach introduces discrete representation learning based on a latent vector instead of volumetric grids, which not only reduces computational costs but also preserves essential geometric details by learning the joint distributions in a more tractable order. The capacity of conditional generation can thus be realized by simply concatenating various conditioning inputs to the latent vector, such as point clouds, categories, images, and texts. In addition, thanks to the simplicity of our model architecture, we naturally scale up our approach to a larger model with an impressive 3.6 billion parameters, further enhancing the quality of versatile 3D generation. Extensive experiments on four generation tasks demonstrate that Argus3D can synthesize diverse and faithful shapes across multiple categories, achieving remarkable performance.
Abstract（参考訳）: 自己回帰モデルでは,格子空間の関節分布をモデル化することにより,2次元画像生成において顕著な結果が得られた。本稿では, 自動回帰モデルを3次元領域に拡張し, キャパシティとスケーラビリティを同時に向上させることにより, 3次元形状生成の強力な能力を求める。まず、利用可能な3Dデータセットのアンサンブルを活用して、大規模モデルのトレーニングを容易にする。約90,000のオブジェクトからなる包括的なコレクションで構成され、メッシュ、ポイント、ボクセル、レンダリング画像、テキストキャプションの複数の特性を持つ。この多彩なラベル付きデータセットは、Objaverse-Mixと呼ばれ、幅広いオブジェクトのバリエーションから学習するためのモデルを可能にします。しかし、3次元自己回帰を直接適用することは、体積格子に対する高い計算要求とグリッド次元に沿ったあいまいな自己回帰順序という重要な課題に遭遇し、3次元形状の質は劣る。この目的のために、キャパシティの観点から、新しいフレームワークArgus3Dを提示する。具体的には,体積格子の代わりに潜在ベクトルに基づく離散表現学習を導入し,計算コストを削減できるだけでなく,よりトラクタブルな順序で関節分布を学習することで,重要な幾何学的詳細を保っている。これにより、点雲、カテゴリ、画像、テキストなど、様々な条件入力を潜在ベクトルに簡単に結合することで、条件生成の能力を実現することができる。さらに、モデルアーキテクチャの単純さのおかげで、我々は自然に36億のパラメータを持つ大きなモデルにアプローチを拡大し、多目的な3D生成の品質をさらに向上させます。 4つの世代タスクに関する大規模な実験により、Argus3Dは様々なカテゴリにまたがる多様で忠実な形状を合成でき、優れた性能を達成できることを示した。

論文の概要: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

関連論文リスト