ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding
- URL: http://arxiv.org/abs/2505.21381v5
- Date: Tue, 08 Jul 2025 00:29:19 GMT
- Title: ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding
- Authors: Linshuang Diao, Dayong Ren, Sensen Song, Yurong Qian,
- Abstract summary: State Space models (SSMs) such as PointMamba enable efficient feature extraction for point cloud self-supervised learning.<n>Existing PointMamba-based methods depend on complex token ordering and random masking.<n>We propose ZigzagPointMamba to tackle these challenges.
- Score: 2.0802801063068403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State Space models (SSMs) such as PointMamba enable efficient feature extraction for point cloud self-supervised learning with linear complexity, outperforming Transformers in computational efficiency. However, existing PointMamba-based methods depend on complex token ordering and random masking, which disrupt spatial continuity and local semantic correlations. We propose ZigzagPointMamba to tackle these challenges. The core of our approach is a simple zigzag scan path that globally sequences point cloud tokens, enhancing spatial continuity by preserving the proximity of spatially adjacent point tokens. Nevertheless, random masking undermines local semantic modeling in self-supervised learning. To address this, we introduce a Semantic-Siamese Masking Strategy (SMS), which masks semantically similar tokens to facilitate reconstruction by integrating local features of original and similar tokens. This overcomes the dependence on isolated local features and enables robust global semantic modeling. Our pre-trained ZigzagPointMamba weights significantly improve downstream tasks, achieving a 1.59% mIoU gain on ShapeNetPart for part segmentation, a 0.4% higher accuracy on ModelNet40 for classification, and 0.19%, 1.22%, and 0.72% higher accuracies respectively for the classification tasks on the OBJ-BG, OBJ-ONLY, and PB-T50-RS subsets of ScanObjectNN.
Related papers
- StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning [31.585380521480868]
We propose StruMamba3D, a novel paradigm for self-supervised point cloud representation learning.<n>We design spatial states and use them as proxies to preserve spatial dependencies among points.<n>Our method attains the SOTA 95.1% accuracy on ModelNet40 and 92.75% accuracy on the most challenging split of ScanObjectNN without voting strategy.
arXiv Detail & Related papers (2025-06-26T17:58:05Z) - HyMamba: Mamba with Hybrid Geometry-Feature Coupling for Efficient Point Cloud Classification [7.139631485661567]
HyMamba is a geometry and feature coupled Mamba framework featuring: (1) Geometry-Feature Coupled Pooling (GFCP), which dynamically aggregating adjacent geometric information into local features; (2) Collaborative Feature Enhancer (CoFE), which enhances sparse signal capture through cross-path feature hybridization;.<n>The proposed model achieves superior classification performance, particularly on the ModelNet40 dataset, where it elevates accuracy to 95.99% with merely 0.03M additional parameters. Furthermore, it attains 98.9% accuracy on the ModelNetShot dataset, validating its robust generalization capabilities under sparse samples.
arXiv Detail & Related papers (2025-05-16T10:30:20Z) - Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy [15.032048930130614]
We propose a novel SSM-based point cloud processing backbone, named Point Mamba, with a causality-aware ordering mechanism.
Our method achieves state-of-the-art performance compared with transformer-based counterparts, with 93.4% accuracy and 75.7 mIOU respectively.
Our method demonstrates the great potential that SSM can serve as a generic backbone in point cloud understanding.
arXiv Detail & Related papers (2024-03-11T07:07:39Z) - Point Cloud Mamba: Point Cloud Learning via State Space Model [73.7454734756626]
We show that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
In particular, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z) - PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks.
Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z) - Decoupled Local Aggregation for Point Cloud Learning [12.810517967372043]
We propose to decouple the explicit modelling of spatial relations from local aggregation.
We present DeLA, a lightweight point network, where in each learning stage relative spatial encodings are first formed.
DeLA achieves over 90% overall accuracy on ScanObjectNN and 74% mIoU on S3DIS Area 5.
arXiv Detail & Related papers (2023-08-31T08:21:29Z) - PointPatchMix: Point Cloud Mixing with Patch Scoring [58.58535918705736]
We propose PointPatchMix, which mixes point clouds at the patch level and generates content-based targets for mixed point clouds.
Our approach preserves local features at the patch level, while the patch scoring module assigns targets based on the content-based significance score from a pre-trained teacher model.
With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86.3% accuracy on ScanObjectNN and 94.1% accuracy on ModelNet40.
arXiv Detail & Related papers (2023-03-12T14:49:42Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Fast Point Voxel Convolution Neural Network with Selective Feature
Fusion for Point Cloud Semantic Segmentation [7.557684072809662]
We present a novel lightweight convolutional neural network for point cloud analysis.
Our method operates on the entire point sets without sampling and achieves good performances efficiently.
arXiv Detail & Related papers (2021-09-23T19:39:01Z) - MST: Masked Self-Supervised Transformer for Visual Representation [52.099722121603506]
Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP)
We present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image.
MST achieves Top-1 accuracy of 76.9% with DeiT-S only using 300-epoch pre-training by linear evaluation.
arXiv Detail & Related papers (2021-06-10T11:05:18Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - Joint Object Contour Points and Semantics for Instance Segmentation [1.2117737635879038]
We propose Mask Point R-CNN aiming at promoting the neural network's attention to the object boundary.
Specifically, we innovatively extend the original human keypoint detection task to the contour point detection of any object.
As a consequence, the model will be more sensitive to the edges of the object and can capture more geometric features.
arXiv Detail & Related papers (2020-08-02T11:11:28Z) - PointINS: Point-based Instance Segmentation [117.38579097923052]
Mask representation in instance segmentation with Point-of-Interest (PoI) features is challenging because learning a high-dimensional mask feature for each instance requires a heavy computing burden.
We propose an instance-aware convolution, which decomposes this mask representation learning task into two tractable modules.
Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach.
arXiv Detail & Related papers (2020-03-13T08:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.