HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
- URL: http://arxiv.org/abs/2411.01408v1
- Date: Sun, 03 Nov 2024 02:35:17 GMT
- Title: HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
- Authors: Wenzhao Qiu, Shanmin Pang, Hao zhang, Jianwu Fang, Jianru Xue,
- Abstract summary: We introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions.
Our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods.
HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets.
- Score: 22.871397412478274
- License:
- Abstract: Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well as in the implementation of view transformation. In response, we introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions. By integrating height priors, our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods. HeightMapNet also introduces a foreground-background separation network that sharply distinguishes between critical road elements and extraneous background components, enabling precise focus on detailed road micro-features. Additionally, our method leverages multi-scale features within the BEV space, optimally utilizing spatial geometric information to boost model performance. HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets, outperforming several widely recognized approaches. The code will be available at \url{https://github.com/adasfag/HeightMapNet/}.
Related papers
- TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps)
We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information.
Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z) - Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction [28.071645239063553]
We present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features.
On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames.
arXiv Detail & Related papers (2024-09-12T12:12:19Z) - HeightLane: BEV Heightmap guided 3D Lane Detection [6.940660861207046]
Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling.
Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption.
HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
arXiv Detail & Related papers (2024-08-15T17:14:57Z) - TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation [10.898724668444125]
We present a learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time.
We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain.
arXiv Detail & Related papers (2024-01-30T22:37:24Z) - Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity
Monocular Dense Mapping [51.739466714312805]
We introduce Hi-Map, a novel monocular dense mapping approach based on Neural Radiance Field (NeRF)
Hi-Map is exceptional in its capacity to achieve efficient and high-fidelity mapping using only posed RGB inputs.
arXiv Detail & Related papers (2024-01-06T12:32:25Z) - Sharp Eyes: A Salient Object Detector Working The Same Way as Human
Visual Characteristics [3.222802562733787]
We propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it.
The proposed method aims to utilize the expanded objects to guide the network obtain complete prediction.
arXiv Detail & Related papers (2023-01-18T11:00:45Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - Diff-Net: Image Feature Difference based High-Definition Map Change
Detection [13.666189678747996]
Up-to-date High-Definition (HD) maps are essential for self-driving cars.
We present a deep neural network (DNN), Diff-Net, to detect changes in them.
Results demonstrate that our Diff-Net achieves better performance than the baseline methods and is ready to be integrated into a map production maintaining an up-to-date HD map.
arXiv Detail & Related papers (2021-07-14T22:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.