Related papers: Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation

Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation

URL: http://arxiv.org/abs/2504.18325v1
Date: Fri, 25 Apr 2025 13:08:41 GMT
Title: Depth3DLane: Monocular 3D Lane Detection via Depth Prior Distillation
Authors: Dongxin Lyu, Han Huang, Cheng Tan, Zimu Li,
Abstract summary: We introduce a BEV-based framework to address limitations and improve 3D lane detection accuracy.<n>We leverage Depth Prior Distillation to transfer semantic depth knowledge from a teacher model.<n>Our method achieves state-of-the-art performance in terms of z-axis error.
Score: 5.909083729156255
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Monocular 3D lane detection is challenging due to the difficulty in capturing depth information from single-camera images. A common strategy involves transforming front-view (FV) images into bird's-eye-view (BEV) space through inverse perspective mapping (IPM), facilitating lane detection using BEV features. However, IPM's flat-ground assumption and loss of contextual information lead to inaccuracies in reconstructing 3D information, especially height. In this paper, we introduce a BEV-based framework to address these limitations and improve 3D lane detection accuracy. Our approach incorporates a Hierarchical Depth-Aware Head that provides multi-scale depth features, mitigating the flat-ground assumption by enhancing spatial awareness across varying depths. Additionally, we leverage Depth Prior Distillation to transfer semantic depth knowledge from a teacher model, capturing richer structural and contextual information for complex lane structures. To further refine lane continuity and ensure smooth lane reconstruction, we introduce a Conditional Random Field module that enforces spatial coherence in lane predictions. Extensive experiments validate that our method achieves state-of-the-art performance in terms of z-axis error and outperforms other methods in the field in overall performance. The code is released at: https://anonymous.4open.science/r/Depth3DLane-DCDD.

Related papers

FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers [91.59069344768858]
We introduce Frequency-aware Positional Depth Embedding (FreqPDE) to equip 2D image features with spatial information for 3D detection transformer decoder.<n>FreqPDE combines the 2D image features and 3D position embeddings to generate 3D depth-aware features for query decoding.
arXiv Detail & Related papers (2025-10-17T07:36:54Z)
Depth3DLane: Fusing Monocular 3D Lane Detection with Self-Supervised Monocular Depth Estimation [15.904739731978236]
Multi-modal approaches rely on expensive depth sensors, while fully-supervised depth networks rely on ground-truth depth data that is impractical to collect at scale.<n>We propose Depth3DLane, a novel dual-pathway framework that integrates self-supervised monocular depth estimation.<n>We show that Depth3DLane achieves competitive performance on the OpenLane benchmark dataset.
arXiv Detail & Related papers (2025-07-18T12:23:47Z)
DB3D-L: Depth-aware BEV Feature Transformation for Accurate 3D Lane Detection [3.7115000857388494]
3D Lane detection plays an important role in autonomous driving.<n>Recent advances primarily build Birds-Eye-View (BEV) feature from front-view (FV) images to perceive 3D information of Lane more effectively.<n>However, constructing accurate BEV information from FV image is limited due to the lacking of depth information.
arXiv Detail & Related papers (2025-05-19T15:47:20Z)
Revisiting Monocular 3D Object Detection with Depth Thickness Field [44.4805861813093]
We present MonoDTF, a scene-to-instance depth-adapted network for monocular 3D object detection. The framework mainly comprises a Scene-Level Depth Retargeting (SDR) module and an Instance-Level Spatial Refinement (ISR) module. The latter refines the voxel space with the guidance of instances, enhancing the 3D instance-aware capability of the depth thickness field.
arXiv Detail & Related papers (2024-12-26T10:51:50Z)
Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression [38.70696274059616]
We propose a novel BEV-free method named Anchor3DLane++ which defines 3D lane anchors as structural representations and makes predictions directly from FV features.<n>Our experiments on three popular 3D lane detection benchmarks show that our Anchor3DLane++ outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2024-12-22T06:52:10Z)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation. Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model. Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z)
Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z)
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z)
Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task. To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera. We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z)
Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection. We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment. Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z)
Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z)
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation. We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.