MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
- URL: http://arxiv.org/abs/2408.10602v1
- Date: Tue, 20 Aug 2024 07:30:00 GMT
- Title: MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
- Authors: Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li,
- Abstract summary: How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge.
We propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds.
We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments.
- Score: 4.386035726986601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.
Related papers
- CV-MOS: A Cross-View Model for Motion Segmentation [13.378850442525945]
We introduce CV-MOS, a cross-view model for moving object segmentation.
We decouple spatial-temporal information by capturing the motion from BEV and RV residual maps.
Our method achieved leading IoU(%) scores of 77.5% and 79.2% on the validation and test sets of the SemanticKitti dataset.
arXiv Detail & Related papers (2024-08-25T09:39:26Z) - ProMotion: Prototypes As Motion Learners [46.08051377180652]
We introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks.
ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms.
We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion.
arXiv Detail & Related papers (2024-06-07T15:10:33Z) - MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model [15.418115686945056]
LiDAR-based Moving Object (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans.
We propose a novel LiDAR-based 3D Moving Object with Motion-aware State Space Model, termed MambaMOS.
arXiv Detail & Related papers (2024-04-19T11:17:35Z) - MF-MOS: A Motion-Focused Model for Moving Object Segmentation [10.533968185642415]
Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants.
Previous methods capture motion features from the range images directly.
We propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation.
arXiv Detail & Related papers (2024-01-30T13:55:56Z) - Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with
Bird's Eye View based Appearance and Motion Features [5.186531650935954]
We present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation.
Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency.
We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the LiDAR-temporal information from appearance and motion features.
arXiv Detail & Related papers (2023-05-12T09:28:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.