Related papers: LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation

LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation

URL: http://arxiv.org/abs/2602.11007v1
Date: Wed, 11 Feb 2026 16:34:12 GMT
Title: LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation
Authors: Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, Lap-Pui Chau,
Abstract summary: We introduce LaSSM, which prioritizes simplicity and efficiency while maintaining competitive performance.<n>We also present a coordinate-guided state space model (SSM) decoder that progressively refines queries.<n>LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs.
Score: 21.566771922153027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature of point clouds and rely on computationally intensive attention mechanisms in query decoders. We accordingly introduce LaSSM, prioritizing simplicity and efficiency while maintaining competitive performance. Specifically, we propose a hierarchical semantic-spatial query initializer to derive the query set from superpoints by considering both semantic cues and spatial distribution, achieving comprehensive scene coverage and accelerated convergence. We further present a coordinate-guided state space model (SSM) decoder that progressively refines queries. The novel decoder features a local aggregation scheme that restricts the model to focus on geometrically coherent regions and a spatial dual-path SSM block to capture underlying dependencies within the query set by integrating associated coordinates information. Our design enables efficient instance prediction, avoiding the incorporation of noisy information and reducing redundant computation. LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs, demonstrating its superiority in challenging large-scale scene instance segmentation. LaSSM also achieves competitive performance on ScanNet, ScanNet200, S3DIS and ScanNet++ V1 benchmarks with less computational cost. Extensive ablation studies and qualitative results validate the effectiveness of our design. The code and weights are available at https://github.com/RayYoh/LaSSM.

Related papers

StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning [31.585380521480868]
We propose StruMamba3D, a novel paradigm for self-supervised point cloud representation learning.<n>We design spatial states and use them as proxies to preserve spatial dependencies among points.<n>Our method attains the SOTA 95.1% accuracy on ModelNet40 and 92.75% accuracy on the most challenging split of ScanObjectNN without voting strategy.
arXiv Detail & Related papers (2025-06-26T17:58:05Z)
DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos [53.52664872583893]
Camera-based 3D object detection in Bird's Eye View (BEV) is one of the most important perception tasks in autonomous driving.<n>We propose DySS, a novel method that employs state-space learning and dynamic queries.<n>Our proposed DySS achieves both superior detection performance and efficient inference.
arXiv Detail & Related papers (2025-06-11T23:49:56Z)
SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot [8.080568103779893]
State-space language models such as Mamba match Transformer quality while permitting linear complexity inference.<n>Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix.<n>We introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures.
arXiv Detail & Related papers (2025-06-11T11:14:57Z)
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection [33.49952392298874]
We propose a new 3D object DEtection paradigm with an interactive STate space model (DEST)<n>In the interactive SSM, we design a novel state-dependent SSM parameterization method that enables system states to effectively serve as queries in 3D indoor detection tasks.<n>Our method improves the GroupFree baseline in terms of AP50 on ScanNet V2 and SUN RGB-D datasets.
arXiv Detail & Related papers (2025-03-18T17:58:03Z)
CS-Net:Contribution-based Sampling Network for Point Cloud Simplification [50.55658910053004]
Point cloud sampling plays a crucial role in reducing computation costs and storage requirements for various vision tasks.<n>Traditional sampling methods, such as farthest point sampling, lack task-specific information.<n>We propose a contribution-based sampling network (CS-Net), where the sampling operation is formulated as a Top-k operation.
arXiv Detail & Related papers (2025-01-18T14:56:09Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation [14.214197948110115]
This paper introduces a novel method, named SGIFormer, for 3D instance segmentation. It is composed of the Semantic-guided Mix Query (SMQ) and the Geometric-enhanced Interleaving Transformer (GIT) decoder. It attains state-of-the-art performance on ScanNet V2, ScanNet200, and the challenging high-fidelity ScanNet++ benchmark.
arXiv Detail & Related papers (2024-07-16T10:17:28Z)
Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS) We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z)
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting [67.97870844244187]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.<n>We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)<n>It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z)
Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process. The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.