Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D
Object Detection
- URL: http://arxiv.org/abs/2402.03634v2
- Date: Tue, 12 Mar 2024 07:38:34 GMT
- Title: Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D
Object Detection
- Authors: Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang
Wan, Qixiang Ye, Yanzhao Zhou
- Abstract summary: Ray Denoising is an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples.
Ray Denoising is designed as a plug-and-play module, compatible with any DETR-style multi-view 3D detectors.
It achieves a 1.9% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset.
- Score: 46.041193889845474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view 3D object detection systems often struggle with generating precise
predictions due to the challenges in estimating depth from images, increasing
redundant and incorrect detections. Our paper presents Ray Denoising, an
innovative method that enhances detection accuracy by strategically sampling
along camera rays to construct hard negative examples. These examples, visually
challenging to differentiate from true positives, compel the model to learn
depth-aware features, thereby improving its capacity to distinguish between
true and false positives. Ray Denoising is designed as a plug-and-play module,
compatible with any DETR-style multi-view 3D detectors, and it only minimally
increases training computational costs without affecting inference speed. Our
comprehensive experiments, including detailed ablation studies, consistently
demonstrate that Ray Denoising outperforms strong baselines across multiple
datasets. It achieves a 1.9\% improvement in mean Average Precision (mAP) over
the state-of-the-art StreamPETR method on the NuScenes dataset. It shows
significant performance gains on the Argoverse 2 dataset, highlighting its
generalization capability. The code will be available at
https://github.com/LiewFeng/RayDN.
Related papers
- Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene [22.297964850282177]
We propose LiDAR-2D Self-paced Learning (LiSe) for unsupervised 3D detection.
RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues.
Our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies.
arXiv Detail & Related papers (2024-07-11T14:58:49Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - VirtualPainting: Addressing Sparsity with Virtual Points and
Distance-Aware Data Augmentation for 3D Object Detection [3.5259183508202976]
We present an innovative approach that involves the generation of virtual LiDAR points using camera images.
We also enhance these virtual points with semantic labels obtained from image-based segmentation networks.
Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods.
arXiv Detail & Related papers (2023-12-26T18:03:05Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Pattern-Aware Data Augmentation for LiDAR 3D Object Detection [7.394029879643516]
We propose pattern-aware ground truth sampling, a data augmentation technique that downsamples an object's point cloud based on the LiDAR's characteristics.
We improve the performance of PV-RCNN on the car class by more than 0.7 percent on the KITTI validation split at distances greater than 25 m.
arXiv Detail & Related papers (2021-11-30T19:14:47Z) - RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising [8.142947808507369]
Time-of-Flight (ToF) cameras are subject to high levels of noise and distortions due to Multi-Path-Interference (MPI)
We propose an iterative denoising approach operating in 3D space, that is designed to learn on 2.5D data by enabling 3D point convolutions to correct the points' positions along the view direction.
We demonstrate that our method is able to outperform SOTA methods on several datasets, including two real world datasets and a new large-scale synthetic data set introduced in this paper.
arXiv Detail & Related papers (2021-11-30T15:53:28Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.