MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
- URL: http://arxiv.org/abs/2404.12794v2
- Date: Tue, 6 Aug 2024 03:28:12 GMT
- Title: MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
- Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang,
- Abstract summary: LiDAR-based Moving Object (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans.
We propose a novel LiDAR-based 3D Moving Object with Motion-aware State Space Model, termed MambaMOS.
- Score: 15.418115686945056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code is publicly available at https://github.com/Terminal-K/MambaMOS.
Related papers
- Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models [70.78051873517285]
We present MotionBase, the first million-level motion generation benchmark.
By leveraging this vast dataset, our large motion model demonstrates strong performance across a broad range of motions.
We introduce a novel 2D lookup-free approach for motion tokenization, which preserves motion information and expands codebook capacity.
arXiv Detail & Related papers (2024-10-04T10:48:54Z) - Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation [4.386035726986601]
How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge.
We propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds.
We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments.
arXiv Detail & Related papers (2024-08-20T07:30:00Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - MF-MOS: A Motion-Focused Model for Moving Object Segmentation [10.533968185642415]
Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants.
Previous methods capture motion features from the range images directly.
We propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation.
arXiv Detail & Related papers (2024-01-30T13:55:56Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Event-Free Moving Object Segmentation from Moving Ego Vehicle [88.33470650615162]
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving.
Most segmentation methods leverage motion cues obtained from optical flow maps.
We propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow.
arXiv Detail & Related papers (2023-04-28T23:43:10Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.