MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
- URL: http://arxiv.org/abs/2503.20519v3
- Date: Sun, 20 Apr 2025 08:36:32 GMT
- Title: MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
- Authors: Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee,
- Abstract summary: We introduce MAR-3D, which integrates a pyramid variational autoencoder with a cascaded masked auto-regressive transformer.<n>Our architecture employs random masking during training and auto-regressive denoising in random order during inference, naturally accommodating the unordered property of 3D latent tokens.
- Score: 44.94438766074643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in auto-regressive transformers have revolutionized generative modeling across different domains, from language processing to visual generation, demonstrating remarkable capabilities. However, applying these advances to 3D generation presents three key challenges: the unordered nature of 3D data conflicts with sequential next-token prediction paradigm, conventional vector quantization approaches incur substantial compression loss when applied to 3D meshes, and the lack of efficient scaling strategies for higher resolution latent prediction. To address these challenges, we introduce MAR-3D, which integrates a pyramid variational autoencoder with a cascaded masked auto-regressive transformer (Cascaded MAR) for progressive latent upscaling in the continuous space. Our architecture employs random masking during training and auto-regressive denoising in random order during inference, naturally accommodating the unordered property of 3D latent tokens. Additionally, we propose a cascaded training strategy with condition augmentation that enables efficiently up-scale the latent token resolution with fast convergence. Extensive experiments demonstrate that MAR-3D not only achieves superior performance and generalization capabilities compared to existing methods but also exhibits enhanced scaling capabilities compared to joint distribution modeling approaches (e.g., diffusion transformers).
Related papers
- SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization.
We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift.
We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z) - Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.
By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.
Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z) - Textured 3D Regenerative Morphing with 3D Diffusion Prior [29.7508625572437]
Textured 3D morphing creates smooth and plausible sequences between two 3D objects.<n>Previous methods rely on establishing point-to-point correspondences and determining smooth deformation trajectories.<n>We propose a method for 3D regenerative morphing using a 3D diffusion prior.
arXiv Detail & Related papers (2025-02-20T07:02:22Z) - TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [69.0220314849478]
TripoSG is a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images.<n>The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images.<n>To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
arXiv Detail & Related papers (2025-02-10T16:07:54Z) - 3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes [20.675695749508353]
We introduce 3D-WAG, an AR model for 3D implicit distance fields that can perform unconditional shape generation.<n>By redefining 3D AR generation task as next-scale" prediction, we reduce the computational cost of generation.<n>Our results show 3D-WAG achieves superior performance in key metrics like Coverage and MMD.
arXiv Detail & Related papers (2024-11-28T10:33:01Z) - G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer [4.221298212125194]
We introduce G3PT, a scalable coarse-to-fine 3D generative model utilizing a cross-scale querying transformer.
Cross-scale querying transformer connects tokens globally across different levels of detail without requiring an ordered sequence.
Experiments demonstrate that G3PT achieves superior generation quality and generalization ability compared to previous 3D generation methods.
arXiv Detail & Related papers (2024-09-10T08:27:19Z) - 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning [19.763523500564542]
CHASE is a novel framework that achieves dense-input-level performance using only sparse inputs.
We introduce a Dynamic Avatar Adjustment (DAA) module, which refines deformed Gaussians by leveraging similar poses from the training set.
While designed for sparse inputs, CHASE surpasses state-of-the-art methods across both full and sparse settings on ZJU-MoCap and H36M datasets.
arXiv Detail & Related papers (2024-08-19T02:46:23Z) - Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs [16.05598829701769]
We introduce a novel diffusion architecture tailored for 3D point clouds generation-Diffusion Mamba (DiM-3D)
DiM-3D forgoes traditional attention mechanisms, instead utilizing the inherent efficiency of the Mamba architecture to maintain linear complexity with respect to sequence length.
Our empirical results on the ShapeNet benchmark demonstrate that DiM-3D achieves state-of-the-art performance in generating high-fidelity and diverse 3D shapes.
arXiv Detail & Related papers (2024-06-07T16:02:07Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models.
Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module.
Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.