MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense
Top-View Understanding
- URL: http://arxiv.org/abs/2107.00346v1
- Date: Thu, 1 Jul 2021 10:19:32 GMT
- Title: MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense
Top-View Understanding
- Authors: Kunyu Peng, Juncong Fei, Kailun Yang, Alina Roitberg, Jiaming Zhang,
Frank Bieder, Philipp Heidenreich, Christoph Stiller, Rainer Stiefelhagen
- Abstract summary: We introduce MASS - a Multi-Attentional Semantic model for dense top-view understanding of driving scenes.
Our framework operates on pillar- and occupancy features and comprises three attention-based building blocks.
Our model is shown to be very effective for 3D object detection validated on the KITTI-3D dataset.
- Score: 27.867824780748606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At the heart of all automated driving systems is the ability to sense the
surroundings, e.g., through semantic segmentation of LiDAR sequences, which
experienced a remarkable progress due to the release of large datasets such as
SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse
segmentation of the LiDAR input, dense output masks provide self-driving cars
with almost complete environment information. In this paper, we introduce MASS
- a Multi-Attentional Semantic Segmentation model specifically built for dense
top-view understanding of the driving scenes. Our framework operates on pillar-
and occupancy features and comprises three attention-based building blocks: (1)
a keypoint-driven graph attention, (2) an LSTM-based attention computed from a
vector embedding of the spatial input, and (3) a pillar-based attention,
resulting in a dense 360-degree segmentation mask. With extensive experiments
on both, SemanticKITTI and nuScenes-LidarSeg, we quantitatively demonstrate the
effectiveness of our model, outperforming the state of the art by 19.0% on
SemanticKITTI and reaching 32.7% in mIoU on nuScenes-LidarSeg, where MASS is
the first work addressing the dense segmentation task. Furthermore, our
multi-attention model is shown to be very effective for 3D object detection
validated on the KITTI-3D dataset, showcasing its high generalizability to
other tasks related to 3D vision.
Related papers
- S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving [12.406655155106424]
We propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training.
Our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals.
Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs.
Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level.
arXiv Detail & Related papers (2024-10-30T15:00:06Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving [12.713417063678335]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.