Related papers: PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

URL: http://arxiv.org/abs/2406.06069v1
Date: Mon, 10 Jun 2024 07:24:22 GMT
Title: PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
Authors: Jia-wei Chen, Yu-jie Xiong, Yong-bin Gao,
Abstract summary: Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis.
Score: 8.500020888201231
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Prior to that, Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis. In order to enhance the extraction of global features, we introduce a bidirectional SSM (bi-SSM) framework, which comprises both a traditional token forward SSM and an innovative backward SSM. To enhance the bi-SSM's capability of capturing more comprehensive features without disrupting the sequence relationships required by the bidirectional Mamba, we introduce Transformer, utilizing its self-attention mechanism to process point clouds. Extensive experimental results demonstrate that integrating Mamba with Transformer significantly enhance the model's capability to analysis 3D point cloud.

Related papers

UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection [64.65405058535262]
Recent advances in LiDAR 3D detection have demonstrated the effectiveness of Transformer-based frameworks in capturing the global dependencies from point cloud spaces. Due to the considerable number of 3D voxels and quadratic complexity of Transformers, multiple sequences are grouped before feeding to Transformers, leading to a limited receptive field. Inspired by the impressive performance of State Space Models (SSM) achieved in the field of 2D vision tasks, we propose a novel Unified Mamba (UniMamba) Specifically, a UniMamba block is designed which mainly consists of locality modeling, Z-order serialization and local-global sequential aggregator.
arXiv Detail & Related papers (2025-03-15T06:22:31Z)
3D Point Cloud Generation via Autoregressive Up-sampling [60.05226063558296]
We introduce a pioneering autoregressive generative model for 3D point cloud generation. Inspired by visual autoregressive modeling, we conceptualize point cloud generation as an autoregressive up-sampling process. PointARU progressively refines 3D point clouds from coarse to fine scales.
arXiv Detail & Related papers (2025-03-11T16:30:45Z)
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs [9.978766637766373]
We introduce a method to convert point clouds into 1D sequences that maintain 3D spatial structure with no need for data replication. Our method does not require positional embeddings and allows for shorter sequence lengths while still achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-31T18:58:40Z)
Exploring contextual modeling with linear complexity for point cloud segmentation [43.36716250540622]
We identify the key components of an effective and efficient point cloud segmentation architecture. We show that Mamba features linear computational complexity, offering superior data and inference efficiency compared to Transformers. We further enhance the standard Mamba specifically for point cloud segmentation by identifying its two key shortcomings.
arXiv Detail & Related papers (2024-10-28T16:56:30Z)
Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation [22.653014803666668]
We propose a Faster LiDAR 3D object detection framework, called FASD, which implements heterogeneous model distillation by adaptively uniform cross-model voxel features. We aim to distill the transformer's capacity for high-performance sequence modeling into Mamba models with low FLOPs, achieving a significant improvement in accuracy through knowledge transfer. We evaluated the framework on datasets and nuScenes, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the current SoTA methods.
arXiv Detail & Related papers (2024-09-17T09:30:43Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model [37.375866491592305]
We introduce Mamba, a SSM-based architecture, to the point cloud domain. We propose Mamba24/8D, which has strong global modeling capability under linear complexity. Mamba24/8D obtains state of the art results on several 3D point cloud segmentation tasks.
arXiv Detail & Related papers (2024-06-25T10:23:53Z)
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning [56.14518823931901]
We present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field. We first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer.
arXiv Detail & Related papers (2024-05-24T05:02:51Z)
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model [18.30032389736101]
Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity. We present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction.
arXiv Detail & Related papers (2024-04-23T12:20:27Z)
Point Cloud Mamba: Point Cloud Learning via State Space Model [73.7454734756626]
We show that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs) In particular, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs) Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z)
Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences. By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.