TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba
- URL: http://arxiv.org/abs/2503.13004v1
- Date: Mon, 17 Mar 2025 10:00:14 GMT
- Title: TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba
- Authors: Jiaxu Liu, Li Li, Hubert P. H. Shum, Toby P. Breckon,
- Abstract summary: Diffusion models currently demonstrate impressive performance over various generative tasks.<n>Recent work on image diffusion highlights the strong capabilities of Mamba (state space models)<n>We propose a novel diffusion framework containing dual latent Mamba block (DM-Block) and a time-variant frequency encoder (TF-Encoder)
- Score: 20.941775037488863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models currently demonstrate impressive performance over various generative tasks. Recent work on image diffusion highlights the strong capabilities of Mamba (state space models) due to its efficient handling of long-range dependencies and sequential data modeling. Unfortunately, joint consideration of state space models with 3D point cloud generation remains limited. To harness the powerful capabilities of the Mamba model for 3D point cloud generation, we propose a novel diffusion framework containing dual latent Mamba block (DM-Block) and a time-variant frequency encoder (TF-Encoder). The DM-Block apply a space-filling curve to reorder points into sequences suitable for Mamba state-space modeling, while operating in a latent space to mitigate the computational overhead that arises from direct 3D data processing. Meanwhile, the TF-Encoder takes advantage of the ability of the diffusion model to refine fine details in later recovery stages by prioritizing key points within the U-Net architecture. This frequency-based mechanism ensures enhanced detail quality in the final stages of generation. Experimental results on the ShapeNet-v2 dataset demonstrate that our method achieves state-of-the-art performance (ShapeNet-v2: 0.14\% on 1-NNA-Abs50 EMD and 57.90\% on COV EMD) on certain metrics for specific categories while reducing computational parameters and inference time by up to 10$\times$ and 9$\times$, respectively. Source code is available in Supplementary Materials and will be released upon accpetance.
Related papers
- HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation [11.63340847947103]
We propose a novel HiSTF Mamba framework for text-to-motion generation.<n>We show that HiSTF Mamba achieves state-of-the-art performance across multiple metrics.<n>These findings validate the effectiveness of HiSTF Mamba in achieving high fidelity and strong semantic alignment.
arXiv Detail & Related papers (2025-03-10T04:01:48Z) - Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion [24.4023135536433]
3D semantic scene completion is critical for multiple downstream tasks in autonomous systems.
We propose a unique neural model, leveraging advances from the state space and diffusion generative modeling.
Our approach achieves remarkable 3D semantic scene completion performance with monocular image input.
arXiv Detail & Related papers (2025-01-13T12:18:58Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.
Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.
We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs [9.978766637766373]
We introduce a method to convert point clouds into 1D sequences that maintain 3D spatial structure with no need for data replication.
Our method does not require positional embeddings and allows for shorter sequence lengths while still achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-31T18:58:40Z) - Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models [6.795447206159906]
We propose a novel point cloud U-Net diffusion architecture for 3D generative modeling.
Our network employs a dual-branch architecture, combining the high-resolution representations of points with the computational efficiency of sparse voxels.
Our model excels in all tasks, establishing it as a state-of-the-art diffusion U-Net for point cloud generative modeling.
arXiv Detail & Related papers (2024-08-12T13:41:47Z) - Pamba: Enhancing Global Interaction in Point Clouds via State Space Model [37.375866491592305]
We introduce Mamba, an SSM-based architecture, to the point cloud domain and propose Pamba, a novel architecture with strong global modeling capability under linear complexity.<n>Pamba obtains state-of-the-art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200, S3DIS and nuScenes.
arXiv Detail & Related papers (2024-06-25T10:23:53Z) - Point Cloud Mamba: Point Cloud Learning via State Space Model [73.7454734756626]
We show that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
In particular, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z) - DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture.
DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z) - StarNet: Style-Aware 3D Point Cloud Generation [82.30389817015877]
StarNet is able to reconstruct and generate high-fidelity and even 3D point clouds using a mapping network.
Our framework achieves comparable state-of-the-art performance on various metrics in the point cloud reconstruction and generation tasks.
arXiv Detail & Related papers (2023-03-28T08:21:44Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud
Completion [69.32451612060214]
Real-scanned 3D point clouds are often incomplete, and it is important to recover complete point clouds for downstream applications.
Most existing point cloud completion methods use Chamfer Distance (CD) loss for training.
We propose a novel Point Diffusion-Refinement (PDR) paradigm for point cloud completion.
arXiv Detail & Related papers (2021-12-07T06:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.