LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and
Camera Fusion
- URL: http://arxiv.org/abs/2307.00724v4
- Date: Tue, 3 Oct 2023 10:07:26 GMT
- Title: LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and
Camera Fusion
- Authors: Weiyi Xiong, Jianan Liu, Tao Huang, Qing-Long Han, Yuxuan Xia, Bing
Zhu
- Abstract summary: This paper investigates the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection.
We show that more accurate view transformation can be performed by introducing image depths and radar information to enhance the "sampling" strategy.
Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms the state-of-the-art 3D object detection methods by a significant margin without bells and whistles.
- Score: 14.520176332262725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an emerging technology and a relatively affordable device, the 4D imaging
radar has already been confirmed effective in performing 3D object detection in
autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point
clouds hinder further performance improvement, and in-depth studies about its
fusion with other modalities are lacking. On the other hand, as a new image
view transformation strategy, "sampling" has been applied in a few image-based
detectors and shown to outperform the widely applied "depth-based splatting"
proposed in Lift-Splat-Shoot (LSS), even without image depth prediction.
However, the potential of "sampling" is not fully unleashed. This paper
investigates the "sampling" view transformation strategy on the camera and 4D
imaging radar fusion-based 3D object detection. LiDAR Excluded Lean (LXL)
model, predicted image depth distribution maps and radar 3D occupancy grids are
generated from image perspective view (PV) features and radar bird's eye view
(BEV) features, respectively. They are sent to the core of LXL, called "radar
occupancy-assisted depth-based sampling", to aid image view transformation. We
demonstrated that more accurate view transformation can be performed by
introducing image depths and radar information to enhance the "sampling"
strategy. Experiments on VoD and TJ4DRadSet datasets show that the proposed
method outperforms the state-of-the-art 3D object detection methods by a
significant margin without bells and whistles. Ablation studies demonstrate
that our method performs the best among different enhancement settings.
Related papers
- UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection [2.123197540438989]
Many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information.
We propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process.
We also introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities.
arXiv Detail & Related papers (2024-09-23T06:57:27Z) - RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar [15.776076554141687]
3D occupancy-based perception pipeline has significantly advanced autonomous driving.
Current methods rely on LiDAR or camera inputs for 3D occupancy prediction.
We introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction.
arXiv Detail & Related papers (2024-05-22T21:48:17Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and
Multi-Scale Adaptive Fusion [2.911052912709637]
Four-dimensional (4D) radar--visual odometry (4DRVO) integrates complementary information from 4D radar and cameras.
4DRVO may exhibit significant tracking errors owing to sparsity of 4D radar point clouds.
We present 4DRVO-Net, which is a method for 4D radar--visual odometry.
arXiv Detail & Related papers (2023-08-12T14:00:09Z) - SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with
4D Imaging Radar [12.842457981088378]
This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar.
SMURF mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals.
Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF.
arXiv Detail & Related papers (2023-07-20T11:33:46Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object
Detection [59.765645791588454]
Recently introduced RTS3D builds an efficient 4D Feature-Consistency Embedding space for the intermediate representation of object without depth supervision.
We propose a shape prior non-uniform sampling strategy that performs dense sampling in outer region and sparse sampling in inner region.
Our proposed method has 2.57% improvement on AP3d almost without extra network parameters.
arXiv Detail & Related papers (2021-06-18T09:14:55Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.