LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection
- URL: http://arxiv.org/abs/2406.07023v2
- Date: Wed, 12 Jun 2024 02:26:46 GMT
- Title: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection
- Authors: Jiahua Xu, Si Zuo, Chenfeng Wei, Wei Zhou,
- Abstract summary: LiSD is a voxel-based encoder-decoder framework that addresses both segmentation and detection tasks.
It achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.
- Score: 6.813145466843275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.
Related papers
- Frequency-based Matcher for Long-tailed Semantic Segmentation [22.199174076366003]
We focus on a relatively under-explored task setting, long-tailed semantic segmentation (LTSS)
We propose a dual-metric evaluation system and construct the LTSS benchmark to demonstrate the performance of semantic segmentation methods and long-tailed solutions.
We also propose a transformer-based algorithm to improve LTSS, frequency-based matcher, which solves the oversuppression problem by one-to-many matching.
arXiv Detail & Related papers (2024-06-06T09:57:56Z) - An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models [25.28234439927537]
MMDetection3D-lidarseg is a comprehensive toolbox for efficient training and evaluation of state-of-the-art LiDAR segmentation models.
We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and efficiency.
By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application.
arXiv Detail & Related papers (2024-05-23T17:59:57Z) - EffiPerception: an Efficient Framework for Various Perception Tasks [6.1522068855729755]
EffiPerception is a framework to explore common learning patterns and increase the module.
It could achieve great accuracy robustness with relatively low memory cost under several perception tasks.
EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks.
arXiv Detail & Related papers (2024-03-18T23:22:37Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - RAIS: Robust and Accurate Interactive Segmentation via Continual
Learning [16.382862088005087]
We propose RAIS, a robust and accurate architecture for interactive segmentation with continuous learning.
For efficient learning on the test set, we propose a novel optimization strategy to update global and local parameters.
Our method also shows its robustness in the datasets of remote sensing and medical imaging.
arXiv Detail & Related papers (2022-10-20T03:05:44Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - Bi-Directional Attention for Joint Instance and Semantic Segmentation in
Point Clouds [9.434847591440485]
We build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception.
It uses similarity matrix measured from features for one task to help aggregate non-local information for the other task.
From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified.
arXiv Detail & Related papers (2020-03-11T17:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.