PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
- URL: http://arxiv.org/abs/2406.06069v1
- Date: Mon, 10 Jun 2024 07:24:22 GMT
- Title: PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
- Authors: Jia-wei Chen, Yu-jie Xiong, Yong-bin Gao,
- Abstract summary: Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis.
Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis.
We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis.
- Score: 8.500020888201231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mamba, based on state space model (SSM) with its linear complexity and great success in classification provide its superiority in 3D point cloud analysis. Prior to that, Transformer has emerged as one of the most prominent and successful architectures for point cloud analysis. We present PointABM, a hybrid model that integrates the Mamba and Transformer architectures for enhancing local feature to improve performance of 3D point cloud analysis. In order to enhance the extraction of global features, we introduce a bidirectional SSM (bi-SSM) framework, which comprises both a traditional token forward SSM and an innovative backward SSM. To enhance the bi-SSM's capability of capturing more comprehensive features without disrupting the sequence relationships required by the bidirectional Mamba, we introduce Transformer, utilizing its self-attention mechanism to process point clouds. Extensive experimental results demonstrate that integrating Mamba with Transformer significantly enhance the model's capability to analysis 3D point cloud.
Related papers
- Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model [37.375866491592305]
We introduce Mamba, a SSM-based architecture, to the point cloud domain.
We propose Mamba24/8D, which has strong global modeling capability under linear complexity.
Mamba24/8D obtains state of the art results on several 3D point cloud segmentation tasks.
arXiv Detail & Related papers (2024-06-25T10:23:53Z) - PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis [37.18701051669003]
PoinTramba is a hybrid framework that combines the analytical power of Transformer with the remarkable computational efficiency of Mamba.
Our approach first segments point clouds into groups, where the Transformer meticulously captures intricate intra-group dependencies.
Unlike previous Mamba approaches, we introduce a bi-directional importance-aware ordering (BIO) strategy to tackle the challenges of random ordering effects.
arXiv Detail & Related papers (2024-05-24T11:36:26Z) - SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation [74.07836010698801]
We propose an SMPL-based Transformer framework (SMPLer) to address this issue.
SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation.
Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods.
arXiv Detail & Related papers (2024-04-23T17:59:59Z) - Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model [18.30032389736101]
Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity.
We present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction.
arXiv Detail & Related papers (2024-04-23T12:20:27Z) - Point Cloud Mamba: Point Cloud Learning via State Space Model [64.85865751243448]
This research focuses on applying such architecture in point cloud analysis.
We demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs)
Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets.
arXiv Detail & Related papers (2024-03-01T18:59:03Z) - S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph
Generation in OR [52.964721233679406]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on the multi-stage learning that generates semantic scene graphs dependent on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bimodal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks.
Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z) - Dual Transformer for Point Cloud Analysis [2.160196691362033]
We present a novel point cloud representation learning architecture, named Dual Transformer Network (DTNet)
Specifically, by aggregating the well-designed point-wise and channel-wise multi-head self-attention models simultaneously, DPCT module can capture much richer contextual dependencies semantically from perspective of position and channel.
Extensive quantitative and qualitative experiments on publicly available benchmarks demonstrate the effectiveness of our proposed transformer framework for the tasks of 3D point cloud classification and segmentation, achieving highly competitive performance in comparison with the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-27T08:41:02Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation
and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences.
By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.