Fugu-MT 論文翻訳(概要): Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

論文の概要: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

arxiv url: http://arxiv.org/abs/2402.12225v1
Date: Mon, 19 Feb 2024 15:33:09 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-20 16:00:58.528741
Title: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability
Title（参考訳）: キャパシティとスケーラビリティを考慮した3次元形状生成のための自己回帰モデル
Authors: Xuelin Qian, Yu Wang, Simian Luo, Yinda Zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Xiangyang Xue, Bo Zhao, Tiejun Huang, Yunsheng Wu, Yanwei Fu
Abstract要約: 自己回帰モデルでは,格子空間における関節分布をモデル化することにより,2次元画像生成において顕著な結果が得られた。自動回帰モデルを3次元領域に拡張し,キャパシティとスケーラビリティを同時に向上することにより,3次元形状生成の強力な能力を求める。
参考スコア（独自算出の注目度）: 121.44324465222498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously. Firstly, we leverage an ensemble of publicly available 3D datasets to facilitate the training of large-scale models. It consists of a comprehensive collection of approximately 900,000 objects, with multiple properties of meshes, points, voxels, rendered images, and text captions. This diverse labeled dataset, termed Objaverse-Mix, empowers our model to learn from a wide range of object variations. However, directly applying 3D auto-regression encounters critical challenges of high computational demands on volumetric grids and ambiguous auto-regressive order along grid dimensions, resulting in inferior quality of 3D shapes. To this end, we then present a novel framework Argus3D in terms of capacity. Concretely, our approach introduces discrete representation learning based on a latent vector instead of volumetric grids, which not only reduces computational costs but also preserves essential geometric details by learning the joint distributions in a more tractable order. The capacity of conditional generation can thus be realized by simply concatenating various conditioning inputs to the latent vector, such as point clouds, categories, images, and texts. In addition, thanks to the simplicity of our model architecture, we naturally scale up our approach to a larger model with an impressive 3.6 billion parameters, further enhancing the quality of versatile 3D generation. Extensive experiments on four generation tasks demonstrate that Argus3D can synthesize diverse and faithful shapes across multiple categories, achieving remarkable performance.
Abstract（参考訳）: 自己回帰モデルでは,格子空間における関節分布のモデル化により2次元画像生成が達成されている。本稿では, 自動回帰モデルを3次元領域に拡張し, キャパシティとスケーラビリティを同時に向上することにより, 3次元形状生成の強力な能力を求める。まず,大規模モデルのトレーニングを容易にするために,利用可能な3dデータセットのアンサンブルを活用する。約90,000のオブジェクトからなる包括的なコレクションで構成され、メッシュ、ポイント、ボクセル、レンダリング画像、テキストキャプションの複数の特性を持つ。この多種多様なラベル付きデータセットはobjaverse-mixと呼ばれ、私たちのモデルに幅広いオブジェクトのバリエーションから学ぶ権限を与えます。しかし、3次元自己回帰を直接適用することは、体積格子に対する高い計算要求とグリッド次元に沿ったあいまいな自己回帰順序という重要な課題に直面する。この目的のために、キャパシティの観点から新しいフレームワーク argus3d を提示します。具体的には,体積格子ではなく潜在ベクトルに基づく離散表現学習を導入することで,計算コストを削減できるだけでなく,より扱いやすい順序でジョイント分布を学習することで,重要な幾何学的詳細を保存できる。これにより、点雲、カテゴリ、画像、テキストなど、様々な条件入力を潜在ベクトルに簡単に結合することで条件生成能力を実現することができる。さらに、モデルアーキテクチャの単純さのおかげで、我々は自然に36億のパラメータを持つ大きなモデルにアプローチを拡大し、汎用的な3D生成の品質をさらに向上させます。 4つの世代タスクに関する大規模な実験により、Argus3Dは様々なカテゴリにまたがる多様で忠実な形状を合成できることを示した。

論文の概要: Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

関連論文リスト