SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection
- URL: http://arxiv.org/abs/2406.10907v1
- Date: Sun, 16 Jun 2024 12:11:44 GMT
- Title: SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection
- Authors: Lin Liu, Ziying Song, Qiming Xia, Feiyang Jia, Caiyan Jia, Lei Yang, Hongyu Pan,
- Abstract summary: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications.
Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy.
We propose SparseDet which designs sparse queries as object proxies.
- Score: 14.137208879448389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficient information expression in object proxies. The latter relies on multi-stage pipelines and auxiliary tasks, which reduce the inference speed. To maintain the efficiency of the sparse framework while fully aggregating contextual information, in this work, we propose SparseDet which designs sparse queries as object proxies. It introduces two key modules, the Local Multi-scale Feature Aggregation (LMFA) module and the Global Feature Aggregation (GFA) module, aiming to fully capture the contextual information, thereby enhancing the ability of the proxies to represent objects. Where LMFA sub-module achieves feature fusion across different scales for sparse key voxels %which does this through via coordinate transformations and using nearest neighbor relationships to capture object-level details and local contextual information, GFA sub-module uses self-attention mechanisms to selectively aggregate the features of the key voxels across the entire scene for capturing scene-level contextual information. Experiments on nuScenes and KITTI demonstrate the effectiveness of our method. Specifically, on nuScene, SparseDet surpasses the previous best sparse detector VoxelNeXt by 2.2\% mAP with 13.5 FPS, and on KITTI, it surpasses VoxelNeXt by 1.12\% $\mathbf{AP_{3D}}$ on hard level tasks with 17.9 FPS.
Related papers
- Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift.
We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information.
We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z) - $\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$tendable Deepfake Detection [52.14468236527728]
We propose a novel framework called $X2$-DFD, consisting of three core modules.
The first module, Model Feature Assessment (MFA), measures the detection capabilities of forgery features intrinsic to MLLMs, and gives a descending ranking of these features.
The second module, Strong Feature Strengthening (SFS), enhances the detection and explanation capabilities by fine-tuning the MLLM on a dataset constructed based on the top-ranked features.
The third module, Weak Feature Supplementing (WFS), improves the fine-tuned MLLM's capabilities on lower-ranked features by integrating external dedicated
arXiv Detail & Related papers (2024-10-08T15:28:33Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework that learns spatial-semantic features and discriminative object queries.
The proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS 2017 test (textbf87.8%), YoutubeVOS 2019 (textbf88.1%), MOSE val (textbf74.0%), and LVOS test (textbf73.0%)
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network [2.028685490378346]
We propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information.
Experiments on two large-scale remote sensing datasets demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches.
arXiv Detail & Related papers (2024-04-02T03:36:07Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - PSGformer: Enhancing 3D Point Cloud Instance Segmentation via Precise
Semantic Guidance [11.097083846498581]
PSGformer is a novel 3D instance segmentation network.
It incorporates two key advancements to enhance the performance of 3D instance segmentation.
It exceeds compared state-of-the-art methods by 2.2% on ScanNetv2 hidden test set in terms of mAP.
arXiv Detail & Related papers (2023-07-15T04:45:37Z) - MIANet: Aggregating Unbiased Instance and General Information for
Few-Shot Semantic Segmentation [6.053853367809978]
Existing few-shot segmentation methods are based on the meta-learning strategy and extract instance knowledge from a support set.
We propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation.
Experiments on PASCAL-5i and COCO-20i show that MIANet yields superior performance and set a new state-of-the-art.
arXiv Detail & Related papers (2023-05-23T09:36:27Z) - VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [78.25819070166351]
We propose VoxelNext for fully sparse 3D object detection.
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Our strong sparse convolutional network VoxelNeXt detects and tracks 3D objects through voxel features entirely.
arXiv Detail & Related papers (2023-03-20T17:40:44Z) - PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object
Detection [22.6659359032306]
This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++.
Experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60$%$, 40.18$%$, 68.21$%$ 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.
arXiv Detail & Related papers (2022-08-29T08:14:00Z) - VIN: Voxel-based Implicit Network for Joint 3D Object Detection and
Segmentation for Lidars [12.343333815270402]
A unified neural network structure is presented for joint 3D object detection and point cloud segmentation.
We leverage rich supervision from both detection and segmentation labels rather than using just one of them.
arXiv Detail & Related papers (2021-07-07T02:16:20Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.