Related papers: TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

URL: http://arxiv.org/abs/2505.09140v1
Date: Wed, 14 May 2025 04:48:22 GMT
Title: TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation
Authors: Zechao Guan, Feng Yan, Shuai Du, Lin Ma, Qingshan Liu,
Abstract summary: TopoDiT-3D is a Topology-Aware Diffusion Transformer with a bottleneck structure for 3D point cloud generation.<n> Experimental results demonstrate that TopoDiT-3D outperforms state-of-the-art models in visual quality, diversity, and training efficiency.
Score: 16.55867442584926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Diffusion Transformer (DiT) models have significantly improved 3D point cloud generation. However, existing methods primarily focus on local feature extraction while overlooking global topological information, such as voids, which are crucial for maintaining shape consistency and capturing complex geometries. To address this limitation, we propose TopoDiT-3D, a Topology-Aware Diffusion Transformer with a bottleneck structure for 3D point cloud generation. Specifically, we design the bottleneck structure utilizing Perceiver Resampler, which not only offers a mode to integrate topological information extracted through persistent homology into feature learning, but also adaptively filters out redundant local features to improve training efficiency. Experimental results demonstrate that TopoDiT-3D outperforms state-of-the-art models in visual quality, diversity, and training efficiency. Furthermore, TopoDiT-3D demonstrates the importance of rich topological information for 3D point cloud generation and its synergy with conventional local feature learning. Videos and code are available at https://github.com/Zechao-Guan/TopoDiT-3D.

Related papers

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [69.0220314849478]
TripoSG is a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images.<n>The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images.<n>To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
arXiv Detail & Related papers (2025-02-10T16:07:54Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model. Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++. Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z)
Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments. We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs) SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z)
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation [49.22974835756199]
We propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-04T17:15:46Z)
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z)
3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields. We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z)
3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively. A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level. A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.