Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model
- URL: http://arxiv.org/abs/2404.14966v1
- Date: Tue, 23 Apr 2024 12:20:27 GMT
- Title: Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model
- Authors: Xu Han, Yuan Tang, Zhaoxuan Wang, Xianzhi Li,
- Abstract summary: Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity.
We present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction.
- Score: 18.30032389736101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Transformer-based models for point cloud analysis suffer from quadratic complexity, leading to compromised point cloud resolution and information loss. In contrast, the newly proposed Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity. However, the straightforward adoption of Mamba does not achieve satisfactory performance on point cloud tasks. In this work, we present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction, achieving superior performance, high efficiency, and scalability potential. Specifically, we propose a simple yet effective Local Norm Pooling (LNP) block to extract local geometric features. Additionally, to obtain better global features, we introduce a bidirectional SSM (bi-SSM) with both a token forward SSM and a novel backward SSM that operates on the feature channel. Extensive experimental results show that Mamba3D surpasses Transformer-based counterparts and concurrent works in multiple tasks, with or without pre-training. Notably, Mamba3D achieves multiple SoTA, including an overall accuracy of 92.6% (train from scratch) on the ScanObjectNN and 95.1% (with single-modal pre-training) on the ModelNet40 classification task, with only linear complexity.
Related papers
- Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model [9.718016281821471]
Serialized Point Cloud Mamba Model (Serialized Point Mamba) developed.
Inspired by the Mamba model's success in natural language processing, we propose the Serialized Point Cloud Mamba Model.
Method achieved 76.8 mIoU on Scannet and facilitating 70.3 mIoU on S3DIS.
arXiv Detail & Related papers (2024-07-17T05:26:58Z) - Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model [37.375866491592305]
We introduce Mamba, a SSM-based architecture, to the point cloud domain.
We propose Mamba24/8D, which has strong global modeling capability under linear complexity.
Mamba24/8D obtains state of the art results on several 3D point cloud segmentation tasks.
arXiv Detail & Related papers (2024-06-25T10:23:53Z) - PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis [8.500020888201231]
Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis.
Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis.
We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis.
arXiv Detail & Related papers (2024-06-10T07:24:22Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling [47.94285833315427]
We propose a locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder.
Our results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency.
arXiv Detail & Related papers (2024-05-27T13:19:23Z) - Point Cloud Mamba: Point Cloud Learning via State Space Model [64.85865751243448]
This research focuses on applying such architecture in point cloud analysis.
We demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z) - PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks.
Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.