Related papers: DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection

DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection

URL: http://arxiv.org/abs/2305.15219v1
Date: Wed, 24 May 2023 15:00:01 GMT
Title: DynStatF: An Efficient Feature Fusion Strategy for LiDAR 3D Object Detection
Authors: Yao Rong, Xiangyu Wei, Tianwei Lin, Yueyu Wang, Enkelejda Kasneci
Abstract summary: Augmenting LiDAR input with multiple previous frames provides richer semantic information. Crowded point clouds in multi-frames can hurt the precise position information due to the motion blur and inaccurate point projection. We propose a novel feature fusion strategy, DynStaF, which enhances the rich semantic information provided by the multi-frame.
Score: 21.573784416916546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Augmenting LiDAR input with multiple previous frames provides richer semantic information and thus boosts performance in 3D object detection, However, crowded point clouds in multi-frames can hurt the precise position information due to the motion blur and inaccurate point projection. In this work, we propose a novel feature fusion strategy, DynStaF (Dynamic-Static Fusion), which enhances the rich semantic information provided by the multi-frame (dynamic branch) with the accurate location information from the current single-frame (static branch). To effectively extract and aggregate complimentary features, DynStaF contains two modules, Neighborhood Cross Attention (NCA) and Dynamic-Static Interaction (DSI), operating through a dual pathway architecture. NCA takes the features in the static branch as queries and the features in the dynamic branch as keys (values). When computing the attention, we address the sparsity of point clouds and take only neighborhood positions into consideration. NCA fuses two features at different feature map scales, followed by DSI providing the comprehensive interaction. To analyze our proposed strategy DynStaF, we conduct extensive experiments on the nuScenes dataset. On the test set, DynStaF increases the performance of PointPillars in NDS by a large margin from 57.7% to 61.6%. When combined with CenterPoint, our framework achieves 61.0% mAP and 67.7% NDS, leading to state-of-the-art performance without bells and whistles.

Related papers

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection [33.49952392298874]
We propose a new 3D object DEtection paradigm with an interactive STate space model (DEST) In the interactive SSM, we design a novel state-dependent SSM parameterization method that enables system states to effectively serve as queries in 3D indoor detection tasks. Our method improves the GroupFree baseline in terms of AP50 on ScanNet V2 and SUN RGB-D datasets.
arXiv Detail & Related papers (2025-03-18T17:58:03Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud [7.711666704468952]
We address the problem of traversability assessment using point clouds. We propose a pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume. We then propose a newtemporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds.
arXiv Detail & Related papers (2024-06-24T12:01:55Z)
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs) Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation. We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation. We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection [47.941714033657675]
3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. We design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames. Our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches.
arXiv Detail & Related papers (2022-08-04T15:41:43Z)
MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection [44.619039588252676]
We present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. Our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences.
arXiv Detail & Related papers (2022-05-12T09:38:42Z)
Background-Aware 3D Point Cloud Segmentationwith Dynamic Point Feature Aggregation [12.093182949686781]
We propose a novel 3D point cloud learning network, referred to as Dynamic Point Feature Aggregation Network (DPFA-Net) DPFA-Net has two variants for semantic segmentation and classification of 3D point clouds. It achieves the state-of-the-art overall accuracy score for semantic segmentation on the S3DIS dataset.
arXiv Detail & Related papers (2021-11-14T05:46:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.