Related papers: HeightLane: BEV Heightmap guided 3D Lane Detection

HeightLane: BEV Heightmap guided 3D Lane Detection

URL: http://arxiv.org/abs/2408.08270v1
Date: Thu, 15 Aug 2024 17:14:57 GMT
Title: HeightLane: BEV Heightmap guided 3D Lane Detection
Authors: Chaesong Park, Eunbin Seo, Jongwoo Lim,
Abstract summary: Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling. Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption. HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
Score: 6.940660861207046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling. Previous attempts to model the ground have often used a planar ground assumption with limited degrees of freedom, making them unsuitable for complex road environments with varying slopes. Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption. This approach provides a detailed and accurate representation of the ground. HeightLane employs the predicted heightmap along with a deformable attention-based spatial feature transform framework to efficiently convert 2D image features into 3D bird's eye view (BEV) features, enhancing spatial understanding and lane structure recognition. Additionally, the heightmap is used for the positional encoding of BEV features, further improving their spatial accuracy. This explicit view transformation bridges the gap between front-view perceptions and spatially accurate BEV representations, significantly improving detection performance. To address the lack of the necessary ground truth (GT) height map in the original OpenLane dataset, we leverage the Waymo dataset and accumulate its LiDAR data to generate a height map for the drivable area of each scene. The GT heightmaps are used to train the heightmap extraction module from monocular images. Extensive experiments on the OpenLane validation set show that HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.

Related papers

HeightFormer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer [8.975640955554555]
A framework learning height prediction in voxel features via transformer is proposed, dubbed HeightFormer. It groups the voxel features into local height sequences, and utilize attention mechanism to obtain height distribution prediction. The proposed method is applied to two large-scale roadside benchmarks, DAIR-V2X-I and Rope3D.
arXiv Detail & Related papers (2025-03-13T18:17:19Z)
TS-SatMVSNet: Slope Aware Height Estimation for Large-Scale Earth Terrain Multi-view Stereo [19.509863059288037]
3D terrain reconstruction with remote sensing imagery achieves cost-effective and large-scale earth observation. We propose an end-to-end slope-aware height estimation network named TS-SatMVSNet for large-scale remote sensing terrain reconstruction. To fully integrate slope information into the MVS pipeline, we design two slope-guided modules to enhance reconstruction outcomes at both micro and macro levels.
arXiv Detail & Related papers (2025-01-02T04:18:40Z)
HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection [34.72603963887331]
HV-BEV is a novel approach that decouples feature sampling in the BEV grid queries paradigm into horizontal feature aggregation and vertical adaptive height-aware reference point sampling. Our best-performing model achieves a remarkable 50.5% mAP and 59.8% NDS on the nuScenes testing set.
arXiv Detail & Related papers (2024-12-25T11:49:14Z)
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning [22.871397412478274]
We introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions. Our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods. HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets.
arXiv Detail & Related papers (2024-11-03T02:35:17Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics. In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z)
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z)
Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation [10.898724668444125]
We present a learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain.
arXiv Detail & Related papers (2024-01-30T22:37:24Z)
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity [34.025530326420146]
We develop Complementary-BEV, a novel end-to-end monocular 3D object detection framework. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode.
arXiv Detail & Related papers (2023-10-04T13:38:53Z)
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios [0.0]
We propose OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth.
arXiv Detail & Related papers (2023-07-20T15:06:44Z)
Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights [50.52704854147297]
We present a new vision transformer (ViT) model optimized with a classification (discrete) and a continuous loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function.
arXiv Detail & Related papers (2023-04-22T22:39:03Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.