HeightLane: BEV Heightmap guided 3D Lane Detection
- URL: http://arxiv.org/abs/2408.08270v2
- Date: Fri, 24 Jan 2025 09:18:44 GMT
- Title: HeightLane: BEV Heightmap guided 3D Lane Detection
- Authors: Chaesong Park, Eunbin Seo, Jongwoo Lim,
- Abstract summary: Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling.
Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption.
HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
- Score: 6.940660861207046
- License:
- Abstract: Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling. Previous attempts to model the ground have often used a planar ground assumption with limited degrees of freedom, making them unsuitable for complex road environments with varying slopes. Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption. This approach provides a detailed and accurate representation of the ground. HeightLane employs the predicted heightmap along with a deformable attention-based spatial feature transform framework to efficiently convert 2D image features into 3D bird's eye view (BEV) features, enhancing spatial understanding and lane structure recognition. Additionally, the heightmap is used for the positional encoding of BEV features, further improving their spatial accuracy. This explicit view transformation bridges the gap between front-view perceptions and spatially accurate BEV representations, significantly improving detection performance. To address the lack of the necessary ground truth (GT) height map in the original OpenLane dataset, we leverage the Waymo dataset and accumulate its LiDAR data to generate a height map for the drivable area of each scene. The GT heightmaps are used to train the heightmap extraction module from monocular images. Extensive experiments on the OpenLane validation set show that HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
Related papers
- BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [11.50186721264038]
This paper addresses the problem of weakly supervised cross-view localization.
The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations.
We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
arXiv Detail & Related papers (2025-02-13T08:54:04Z) - TS-SatMVSNet: Slope Aware Height Estimation for Large-Scale Earth Terrain Multi-view Stereo [19.509863059288037]
3D terrain reconstruction with remote sensing imagery achieves cost-effective and large-scale earth observation.
We propose an end-to-end slope-aware height estimation network named TS-SatMVSNet for large-scale remote sensing terrain reconstruction.
To fully integrate slope information into the MVS pipeline, we design two slope-guided modules to enhance reconstruction outcomes at both micro and macro levels.
arXiv Detail & Related papers (2025-01-02T04:18:40Z) - HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection [34.72603963887331]
HV-BEV is a novel approach that decouples feature sampling in the BEV grid queries paradigm into horizontal feature aggregation and vertical adaptive height-aware reference point sampling.
Our best-performing model achieves a remarkable 50.5% mAP and 59.8% NDS on the nuScenes testing set.
arXiv Detail & Related papers (2024-12-25T11:49:14Z) - RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.
RaCFormer achieves superior results of 64.9% mAP and 70.2% NDS on nuScenes, even outperforming several LiDAR-based detectors.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning [22.871397412478274]
We introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions.
Our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods.
HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets.
arXiv Detail & Related papers (2024-11-03T02:35:17Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework.
We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D.
For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z) - Vision Transformers, a new approach for high-resolution and large-scale
mapping of canopy heights [50.52704854147297]
We present a new vision transformer (ViT) model optimized with a classification (discrete) and a continuous loss function.
This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function.
arXiv Detail & Related papers (2023-04-22T22:39:03Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.