Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
- URL: http://arxiv.org/abs/2507.17745v3
- Date: Thu, 31 Jul 2025 15:39:27 GMT
- Title: Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention
- Authors: Yiwen Chen, Zhihao Li, Yikai Wang, Hu Zhang, Qin Li, Chi Zhang, Guosheng Lin,
- Abstract summary: We propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality.<n>Part Attention is a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions.<n>Experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
- Score: 54.15345846343084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in sparse voxel representations have significantly improved the quality of 3D content generation, enabling high-resolution modeling with fine-grained geometry. However, existing frameworks suffer from severe computational inefficiencies due to the quadratic complexity of attention mechanisms in their two-stage diffusion pipelines. In this work, we propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality. Our method leverages the compact VecSet representation to efficiently generate a coarse object layout in the first stage, reducing token count and accelerating voxel coordinate prediction. To refine per-voxel latent features in the second stage, we introduce Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7x speed-up in latent generation. To support this mechanism, we construct a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels. Extensive experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
Related papers
- SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion [0.0]
We present a novel framework for dynamic 3D scene reconstruction that integrates three key components.<n>An explicit tri-plane deformation field, a view-conditioned canonical field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior.<n>Our method encodes 4D scenes using three 2D feature planes that evolve over time, enabling efficient compact representation.
arXiv Detail & Related papers (2025-05-22T11:25:38Z) - Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling [34.238349310770886]
We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE.<n>Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry.
arXiv Detail & Related papers (2025-05-20T15:44:54Z) - SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling [79.56581753856452]
SparseFlex is a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to $10243$ directly from rendering losses.<n>By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.
arXiv Detail & Related papers (2025-03-27T17:46:42Z) - SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting [116.22623164585114]
SOGS is an anchor-based 3D-GS technique that introduces second-order anchors to achieve superior rendering quality and reduced anchor features and model size simultaneously.<n>We show that SOGS achieves superior rendering quality in novel view synthesis with clearly reduced model size.
arXiv Detail & Related papers (2025-03-10T15:50:46Z) - Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels [51.08794269211701]
We introduce 3D Linear Splatting (3DLS), which replaces Gaussian kernels with linear kernels to achieve sharper and more precise results.<n>3DLS demonstrates state-of-the-art fidelity and accuracy, along with a 30% FPS improvement over baseline 3DGS.
arXiv Detail & Related papers (2024-11-19T11:59:54Z) - Direct and Explicit 3D Generation from a Single Image [25.207277983430608]
We introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images.
We incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency.
By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation.
arXiv Detail & Related papers (2024-11-17T03:14:50Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation.
We employ a sparse convolutional architecture to efficiently operate on room-scale scenes.
By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z) - Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs [16.05598829701769]
We introduce a novel diffusion architecture tailored for 3D point clouds generation-Diffusion Mamba (DiM-3D)
DiM-3D forgoes traditional attention mechanisms, instead utilizing the inherent efficiency of the Mamba architecture to maintain linear complexity with respect to sequence length.
Our empirical results on the ShapeNet benchmark demonstrate that DiM-3D achieves state-of-the-art performance in generating high-fidelity and diverse 3D shapes.
arXiv Detail & Related papers (2024-06-07T16:02:07Z) - Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles [0.0]
This work presents an innovative approach to validate video content.
The methodology blends advanced 2-dimensional and 3-dimensional Convolutional Neural Networks.
Experimental validation underscores the effectiveness of this strategy, showcasing its potential in countering deepfakes generation.
arXiv Detail & Related papers (2023-10-25T06:00:37Z) - TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.