MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with
Bird's Eye View based Appearance and Motion Features
- URL: http://arxiv.org/abs/2305.07336v2
- Date: Tue, 1 Aug 2023 09:16:32 GMT
- Title: MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with
Bird's Eye View based Appearance and Motion Features
- Authors: Bo Zhou, Jiapeng Xie, Yan Pan, Jiajie Wu, and Chuanzhao Lu
- Abstract summary: We present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation.
Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency.
We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the LiDAR-temporal information from appearance and motion features.
- Score: 5.186531650935954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying moving objects is an essential capability for autonomous systems,
as it provides critical information for pose estimation, navigation, collision
avoidance, and static map construction. In this paper, we present MotionBEV, a
fast and accurate framework for LiDAR moving object segmentation, which
segments moving objects with appearance and motion features in the bird's eye
view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV
representation to improve computational efficiency. Specifically, we learn
appearance features with a simplified PointNet and compute motion features
through the height differences of consecutive frames of point clouds projected
onto vertical columns in the polar BEV coordinate system. We employ a
dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM)
to adaptively fuse the spatio-temporal information from appearance and motion
features. Our approach achieves state-of-the-art performance on the
SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical
effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a
solid-state LiDAR, which features non-repetitive scanning patterns and a small
field of view.
Related papers
- CV-MOS: A Cross-View Model for Motion Segmentation [13.378850442525945]
We introduce CV-MOS, a cross-view model for moving object segmentation.
We decouple spatial-temporal information by capturing the motion from BEV and RV residual maps.
Our method achieved leading IoU(%) scores of 77.5% and 79.2% on the validation and test sets of the SemanticKitti dataset.
arXiv Detail & Related papers (2024-08-25T09:39:26Z) - MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation [4.386035726986601]
How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge.
We propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds.
We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments.
arXiv Detail & Related papers (2024-08-20T07:30:00Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - An Effective Motion-Centric Paradigm for 3D Single Object Tracking in
Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving.
Current approaches all follow the Siamese paradigm based on appearance matching.
We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks.
We extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames.
arXiv Detail & Related papers (2022-03-14T15:25:42Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation [8.184561295177623]
This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data.
We use two successive scans of LiDAR data in 2D Bird's Eye View representation to perform pixel-wise classification as static or moving.
We demonstrate a low latency of 8 ms on a commonly used automotive embedded platform, namely Nvidia Jetson Xavier.
arXiv Detail & Related papers (2021-11-08T23:40:55Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.