Related papers: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

URL: http://arxiv.org/abs/2406.07023v2
Date: Wed, 12 Jun 2024 02:26:46 GMT
Title: LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection
Authors: Jiahua Xu, Si Zuo, Chenfeng Wei, Wei Zhou,
Abstract summary: LiSD is a voxel-based encoder-decoder framework that addresses both segmentation and detection tasks. It achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.
Score: 6.813145466843275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods.

Related papers

IGL-DT: Iterative Global-Local Feature Learning with Dual-Teacher Semantic Segmentation Framework under Limited Annotation Scheme [3.440487702095727]
Semi-Supervised Semantic (SSSS) aims to improve segmentation accuracy by leveraging a small set of labeled images alongside a larger pool of unlabeled data. We propose a novel tri-branch semi-supervised segmentation framework incorporating a dual-teacher strategy, named IGL-DT. Our approach employs SwinUnet for high-level semantic guidance through Global Context Learning and ResUnet for detailed feature refinement via Local Regional Learning.
arXiv Detail & Related papers (2025-04-14T01:51:29Z)
Evaluation framework for Image Segmentation Algorithms [0.0]
This paper introduces the fundamental concepts and importance of image segmentation, and the role of interactive segmentation in enhancing accuracy. A detailed background theory section explores various segmentation methods, including thresholding, edge detection, region growing, feature extraction, random forests, support vector machines, convolutional neural networks, U-Net, and Mask R-CNN. A comparative analysis presents detailed results, emphasizing the strengths, limitations, and trade-offs of each method.
arXiv Detail & Related papers (2025-04-06T10:20:26Z)
Leveraging Labelled Data Knowledge: A Cooperative Rectification Learning Network for Semi-supervised 3D Medical Image Segmentation [27.94353306813293]
Semi-supervised 3D medical image segmentation aims to achieve accurate segmentation using few labelled data and numerous unlabelled data. Main challenge in the design of semi-supervised learning methods is the effective use of the unlabelled data for training. We introduce a new methodology to produce high-quality pseudo-labels for a consistency learning strategy.
arXiv Detail & Related papers (2025-02-17T05:29:50Z)
Frequency-based Matcher for Long-tailed Semantic Segmentation [22.199174076366003]
We focus on a relatively under-explored task setting, long-tailed semantic segmentation (LTSS) We propose a dual-metric evaluation system and construct the LTSS benchmark to demonstrate the performance of semantic segmentation methods and long-tailed solutions. We also propose a transformer-based algorithm to improve LTSS, frequency-based matcher, which solves the oversuppression problem by one-to-many matching.
arXiv Detail & Related papers (2024-06-06T09:57:56Z)
An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models [25.28234439927537]
MMDetection3D-lidarseg is a comprehensive toolbox for efficient training and evaluation of state-of-the-art LiDAR segmentation models. We support a wide range of segmentation models and integrate advanced data augmentation techniques to enhance robustness and efficiency. By fostering a unified framework, MMDetection3D-lidarseg streamlines development and benchmarking, setting new standards for research and application.
arXiv Detail & Related papers (2024-05-23T17:59:57Z)
EffiPerception: an Efficient Framework for Various Perception Tasks [6.1522068855729755]
EffiPerception is a framework to explore common learning patterns and increase the module. It could achieve great accuracy robustness with relatively low memory cost under several perception tasks. EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks.
arXiv Detail & Related papers (2024-03-18T23:22:37Z)
Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
RAIS: Robust and Accurate Interactive Segmentation via Continual Learning [16.382862088005087]
We propose RAIS, a robust and accurate architecture for interactive segmentation with continuous learning. For efficient learning on the test set, we propose a novel optimization strategy to update global and local parameters. Our method also shows its robustness in the datasets of remote sensing and medical imaging.
arXiv Detail & Related papers (2022-10-20T03:05:44Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones. We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA) We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z)
Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds [9.434847591440485]
We build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception. It uses similarity matrix measured from features for one task to help aggregate non-local information for the other task. From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified.
arXiv Detail & Related papers (2020-03-11T17:16:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.