Related papers: Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection

Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection

URL: http://arxiv.org/abs/2507.13899v1
Date: Fri, 18 Jul 2025 13:24:32 GMT
Title: Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection
Authors: Yujian Mo, Yan Wu, Junqiao Zhao, Jijun Wang, Yinghao Hu, Jun Yan,
Abstract summary: In this paper, we introduce depth priors predicted by DepthAnything.<n>These priors are fused with the original LiDAR attributes to enrich each point's representation.<n>Experiments on the KITTI benchmark show that our method consistently improves detection accuracy.
Score: 5.6537425944368405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in foundation models have opened up new possibilities for enhancing 3D perception. In particular, DepthAnything offers dense and reliable geometric priors from monocular RGB images, which can complement sparse LiDAR data in autonomous driving scenarios. However, such priors remain underutilized in LiDAR-based 3D object detection. In this paper, we address the limited expressiveness of raw LiDAR point features, especially the weak discriminative capability of the reflectance attribute, by introducing depth priors predicted by DepthAnything. These priors are fused with the original LiDAR attributes to enrich each point's representation. To leverage the enhanced point features, we propose a point-wise feature extraction module. Then, a Dual-Path RoI feature extraction framework is employed, comprising a voxel-based branch for global semantic context and a point-based branch for fine-grained structural details. To effectively integrate the complementary RoI features, we introduce a bidirectional gated RoI feature fusion module that balances global and local cues. Extensive experiments on the KITTI benchmark show that our method consistently improves detection accuracy, demonstrating the value of incorporating visual foundation model priors into LiDAR-based 3D object detection.

Related papers

LDRFusion: A LiDAR-Dominant multimodal refinement framework for 3D object detection [5.6537425944368405]
Existing LiDAR-Camera fusion methods have achieved strong results in 3D object detection.<n>We propose LDRFusion, a novel Lidar-dominant two-stage refinement framework for multi-sensor fusion.<n>Our framework consistently achieves strong performance across multiple categories and difficulty levels.
arXiv Detail & Related papers (2025-07-22T04:35:52Z)
MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model [2.0624236247076397]
This study employs a Vision Transformer (ViT)-based foundation model as the backbone, which excels at capturing global features for depth estimation.<n>It integrates a detection transformer (DETR) architecture to improve both depth estimation and object detection performance in a one-stage manner.<n>The proposed model outperforms recent state-of-the-art methods, as demonstrated through evaluations on the KITTI 3D benchmark and a custom dataset collected from high-elevation racing environments.
arXiv Detail & Related papers (2025-02-01T04:37:13Z)
PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection [65.84604846389624]
We propose PointOBB-v3, a stronger single point-supervised OOD framework.<n>It generates pseudo rotated boxes without additional priors and incorporates support for the end-to-end paradigm.<n>Our method achieves an average improvement in accuracy of 3.56% in comparison to previous state-of-the-art methods.
arXiv Detail & Related papers (2025-01-23T18:18:15Z)
PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds [29.15589024703907]
In this paper, we revisit the local point aggregators from the perspective of allocating computational resources. We find that the simplest pillar based models perform surprisingly well considering both accuracy and latency. Our results challenge the common intuition that the detailed geometry modeling is essential to achieve high performance for 3D object detection.
arXiv Detail & Related papers (2023-05-08T17:59:14Z)
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields [112.62936571539232]
We introduce a new task, novel view synthesis for LiDAR sensors. Traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views. We use a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points.
arXiv Detail & Related papers (2023-04-20T15:44:37Z)
AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation. We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z)
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z)
RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds. We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays. Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z)
SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA) Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z)
SIENet: Spatial Information Enhancement Network for 3D Object Detection from Point Cloud [20.84329063509459]
LiDAR-based 3D object detection pushes forward an immense influence on autonomous vehicles. Due to the limitation of the intrinsic properties of LiDAR, fewer points are collected at the objects farther away from the sensor. To address the challenge, we propose a novel two-stage 3D object detection framework, named SIENet.
arXiv Detail & Related papers (2021-03-29T07:45:09Z)
CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature Description [10.73965992177754]
We propose a fully unsupervised Conal Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE. We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio.
arXiv Detail & Related papers (2020-01-06T01:26:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.