Unleashing Vecset Diffusion Model for Fast Shape Generation
- URL: http://arxiv.org/abs/2503.16302v2
- Date: Wed, 26 Mar 2025 15:08:12 GMT
- Title: Unleashing Vecset Diffusion Model for Fast Shape Generation
- Authors: Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qingxiang Lin, Jingwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue,
- Abstract summary: FlashVDM is a framework for accelerating both VAE and DiT in Vecset Diffusion Model (VDM)<n>For DiT, FlashVDM enables flexible diffusion sampling with as few as 5 inference steps and comparable quality.<n>For VAE, we introduce a lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical Volume Decoding, and Efficient Network Design.
- Score: 21.757511934035758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D shape generation has greatly flourished through the development of so-called "native" 3D diffusion, particularly through the Vecset Diffusion Model (VDM). While recent advancements have shown promising results in generating high-resolution 3D shapes, VDM still struggles with high-speed generation. Challenges exist because of difficulties not only in accelerating diffusion sampling but also VAE decoding in VDM, areas under-explored in previous works. To address these challenges, we present FlashVDM, a systematic framework for accelerating both VAE and DiT in VDM. For DiT, FlashVDM enables flexible diffusion sampling with as few as 5 inference steps and comparable quality, which is made possible by stabilizing consistency distillation with our newly introduced Progressive Flow Distillation. For VAE, we introduce a lightning vecset decoder equipped with Adaptive KV Selection, Hierarchical Volume Decoding, and Efficient Network Design. By exploiting the locality of the vecset and the sparsity of shape surface in the volume, our decoder drastically lowers FLOPs, minimizing the overall decoding overhead. We apply FlashVDM to Hunyuan3D-2 to obtain Hunyuan3D-2 Turbo. Through systematic evaluation, we show that our model significantly outperforms existing fast 3D generation methods, achieving comparable performance to the state-of-the-art while reducing inference time by over 45x for reconstruction and 32x for generation. Code and models are available at https://github.com/Tencent/FlashVDM.
Related papers
- Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation [16.210613736589597]
Latent diffusion models have enabled high-quality video synthesis, yet their inference remains costly and time-consuming.<n>We propose a universal acceleration framework for VAE decoders that preserves full alignment with the original latent distribution.<n>We show that Flash-VAED accelerates the end-to-end generation pipeline by up to 36% with negligible quality drops on VBench-2.0.
arXiv Detail & Related papers (2026-02-22T12:43:50Z) - LVADNet3D: A Deep Autoencoder for Reconstructing 3D Intraventricular Flow from Sparse Hemodynamic Data [2.6043530265581505]
We propose LVADNet3D, a 3D convolutional autoencoder that reconstructs full-resolution intraventricular velocity fields from sparse velocity vector inputs.<n>We generate a high-resolution synthetic dataset of intraventricular blood flow in LVAD-supported hearts using CFD simulations.<n>Across various input configurations, LVADNet3D outperforms the baseline UNet3D model, yielding lower reconstruction error and higher PSNR results.
arXiv Detail & Related papers (2025-09-21T01:20:25Z) - Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model [81.01939699480094]
We propose a novel MVS framework, which introduces diffusion models in MVS.<n>Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process.<n>Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasMVS.
arXiv Detail & Related papers (2025-09-18T17:59:19Z) - FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation [7.731788894265875]
We present FVGen, a framework that enables fast novel view synthesis using Video Diffusion Models (VDMs) in as few as four sampling steps.<n>Our framework generates the same number of novel views with similar (or even better) visual quality while reducing sampling time by more than 90%.
arXiv Detail & Related papers (2025-08-08T15:22:41Z) - Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation [49.202383675543466]
We present Acc3D to tackle the challenge of accelerating the diffusion process to generate 3D models from single images.<n>To derive high-quality reconstructions through few-step inferences, we emphasize the critical issue of regularizing the learning of score function in states of random noise.
arXiv Detail & Related papers (2025-03-20T09:18:10Z) - TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba [20.941775037488863]
Diffusion models currently demonstrate impressive performance over various generative tasks.<n>Recent work on image diffusion highlights the strong capabilities of Mamba (state space models)<n>We propose a novel diffusion framework containing dual latent Mamba block (DM-Block) and a time-variant frequency encoder (TF-Encoder)
arXiv Detail & Related papers (2025-03-17T10:00:14Z) - TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation [34.73820805875123]
TIDE (Temporal-aware Sparse Autoencoders for Interpretable Diffusion transformErs) is a novel framework that enhances temporal reconstruction within DiT activation layers across denoising steps.<n>TIDE employs Sparse Autoencoders (SAEs) with a sparse bottleneck layer to extract interpretable and hierarchical features.<n>Our approach achieves state-of-the-art reconstruction performance, with a mean squared error (MSE) of 1e-3 and a cosine similarity of 0.97.
arXiv Detail & Related papers (2025-03-10T08:35:51Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence.
Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise.
We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion [0.0]
BoostDream is a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality.
We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation.
A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets.
arXiv Detail & Related papers (2024-01-30T05:59:00Z) - Fast Training of Diffusion Transformer with Extreme Masking for 3D Point
Clouds Generation [64.99362684909914]
We propose FastDiT-3D, a novel masked diffusion transformer tailored for efficient 3D point cloud generation.
We also propose a novel voxel-aware masking strategy to adaptively aggregate background/foreground information from voxelized point clouds.
Our method achieves state-of-the-art performance with an extreme masking ratio of nearly 99%.
arXiv Detail & Related papers (2023-12-12T12:50:33Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation [57.539634387672656]
Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality.
We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation.
arXiv Detail & Related papers (2023-12-04T18:58:38Z) - Fast Point Cloud Generation with Straight Flows [44.76242251282731]
Point Straight Flow is a model that exhibits impressive performance using one step.
We develop a distillation strategy to shorten the straight path into one step without a performance loss.
We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model.
arXiv Detail & Related papers (2022-12-04T06:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.