Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection
- URL: http://arxiv.org/abs/2503.10579v2
- Date: Sat, 15 Mar 2025 06:23:19 GMT
- Title: Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection
- Authors: Chaoqun Wang, Xiaobin Hong, Wenzhong Li, Ruimao Zhang,
- Abstract summary: LiDAR-based 3D object detection presents significant challenges due to the inherent sparsity of LiDAR points.<n>We propose a novel fusion module to relieve the spatial misalignment caused by the object motion over time.<n>We also propose a Semantic Injection method to enrich the sparse LiDAR data via injecting the point-wise semantic labels.
- Score: 22.890432295751086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR-based 3D object detection presents significant challenges due to the inherent sparsity of LiDAR points. A common solution involves long-term temporal LiDAR data to densify the inputs. However, efficiently leveraging spatial-temporal information remains an open problem. In this paper, we propose a novel Semantic-Supervised Spatial-Temporal Fusion (ST-Fusion) method, which introduces a novel fusion module to relieve the spatial misalignment caused by the object motion over time and a feature-level semantic supervision to sufficiently unlock the capacity of the proposed fusion module. Specifically, the ST-Fusion consists of a Spatial Aggregation (SA) module and a Temporal Merging (TM) module. The SA module employs a convolutional layer with progressively expanding receptive fields to aggregate the object features from the local regions to alleviate the spatial misalignment, the TM module dynamically extracts object features from the preceding frames based on the attention mechanism for a comprehensive sequential presentation. Besides, in the semantic supervision, we propose a Semantic Injection method to enrich the sparse LiDAR data via injecting the point-wise semantic labels, using it for training a teacher model and providing a reconstruction target at the feature level supervised by the proposed object-aware loss. Extensive experiments on various LiDAR-based detectors demonstrate the effectiveness and universality of our proposal, yielding an improvement of approximately +2.8% in NDS based on the nuScenes benchmark.
Related papers
- MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction [15.656771219382076]
MS-Occ is a novel multi-stage LiDAR-camera fusion framework.
It integrates LiDAR's geometric fidelity with camera-based semantic richness.
Experiments show MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%.
arXiv Detail & Related papers (2025-04-22T13:33:26Z) - DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing [40.607660968380394]
Moving object detection (MOD) in remote sensing is significantly challenged by low resolution, extremely small object sizes, and complex noise interference.
Current deep learning-based MOD methods rely on probability density estimation, which restricts flexible information interaction between objects.
We propose a point-based MOD in remote sensing that iteratively recovers moving object centers from sparse noisy points.
arXiv Detail & Related papers (2025-04-14T14:44:52Z) - Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences [25.74000325019015]
We introduce a novel LiDAR 3D object detection framework, namely LiSTM, to facilitate spatial-temporal feature learning with cross-frame motion forecasting information.
We have conducted experiments on the aggregation and nuScenes datasets to demonstrate that the proposed framework achieves superior 3D detection performance.
arXiv Detail & Related papers (2024-09-06T16:29:04Z) - DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation [66.8732965660931]
paper introduces a 3D point cloud sequence learning model based on inconsistent-temporal propagation for LiDAR odometry DSLO.
It consists of a pyramid structure with a sequential pose module, a hierarchical pose refinement module, and a temporal feature propagation module.
arXiv Detail & Related papers (2024-09-01T15:12:48Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Large receptive field strategy and important feature extraction strategy
in 3D object detection [6.3948571459793975]
This study focuses on key challenges in 3D target detection.
To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module.
This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads.
arXiv Detail & Related papers (2024-01-22T13:01:28Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - SIENet: Spatial Information Enhancement Network for 3D Object Detection
from Point Cloud [20.84329063509459]
LiDAR-based 3D object detection pushes forward an immense influence on autonomous vehicles.
Due to the limitation of the intrinsic properties of LiDAR, fewer points are collected at the objects farther away from the sensor.
To address the challenge, we propose a novel two-stage 3D object detection framework, named SIENet.
arXiv Detail & Related papers (2021-03-29T07:45:09Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.