Improving Hierarchical Representations of Vectorized HD Maps with Perspective Clues
- URL: http://arxiv.org/abs/2404.11155v2
- Date: Sat, 11 Oct 2025 15:29:29 GMT
- Title: Improving Hierarchical Representations of Vectorized HD Maps with Perspective Clues
- Authors: Chi Zhang, Qi Song, Feifei Li, Jie Li, Rui Huang,
- Abstract summary: We propose PerCMap, which exploits clues from perspective-view features at both instance and point level.<n>PerCMap achieves strong and consistent performance across benchmarks, reaching 67.1 and 70.5 mAP, respectively.
- Score: 20.730599565199935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The construction of vectorized High-Definition (HD) maps from onboard surround-view cameras has become a significant focus in autonomous driving. However, current map vector estimation pipelines face two key limitations: input-agnostic queries struggle to capture complex map structures, and the view transformation leads to information loss. These issues often result in inaccurate shape restoration or missing instances in map predictions. To address this concern, we propose a novel approach, namely \textbf{PerCMap}, which explicitly exploits clues from perspective-view features at both instance and point level. Specifically, at instance level, we propose Cross-view Instance Activation (CIA) to activate instance queries across surround-view images, thereby helping the model recover the instance attributes of map vectors. At point level, we design Dual-view Point Embedding (DPE), which fuses features from both views to generate input-aware positional embeddings and improve the accuracy of point coordinate estimation. Extensive experiments on \textit{nuScenes} and \textit{Argoverse 2} demonstrate that PerCMap achieves strong and consistent performance across benchmarks, reaching 67.1 and 70.5 mAP, respectively.
Related papers
- SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving [33.58763384551353]
We propose a Standard-Definition (SD) Map Enhanced Perception and Topology reasoning framework.<n>Our framework significantly improves both scene perception and topology reasoning, outperforming existing methods by a substantial margin.
arXiv Detail & Related papers (2025-05-18T05:57:31Z) - Semi-Supervised 360 Layout Estimation with Panoramic Collaborative Perturbations [56.84921040837699]
We propose a novel semi-supervised method named Semi360, which incorporates the priors of the panoramic layout and distortion through collaborative perturbations.
Our experimental results on three mainstream benchmarks demonstrate that the proposed method offers significant advantages over existing state-of-the-art (SoTA) solutions.
arXiv Detail & Related papers (2025-03-03T02:49:20Z) - TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps)
We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information.
Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z) - VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car.
We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space.
Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z) - HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning [22.871397412478274]
We introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions.
Our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods.
HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets.
arXiv Detail & Related papers (2024-11-03T02:35:17Z) - MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction [75.93907511203317]
We propose MGMapNet (Multi-Granularity Map Network) to model map element with a multi-granularity representation.
The proposed MGMapNet achieves state-of-the-art performance, surpassing MapTRv2 by 5.3 mAP on nuScenes and 4.4 mAP on Argoverse2 respectively.
arXiv Detail & Related papers (2024-10-10T09:05:23Z) - DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction [20.6143278960295]
This paper focuses on temporal instance consistency and temporal map consistency learning.
DTCLMapper is a dual-stream temporal consistency learning module that combines instance embedding with geometry maps.
Experiments on well-recognized benchmarks indicate that the proposed DTCLMapper achieves state-of-the-art performance in vectorized mapping tasks.
arXiv Detail & Related papers (2024-05-09T02:58:55Z) - ADMap: Anti-disturbance framework for reconstructing online vectorized
HD map [9.218463154577616]
This paper proposes the Anti-disturbance Map reconstruction framework (ADMap)
To mitigate point-order jitter, the framework consists of three modules: Multi-Scale Perception Neck, Instance Interactive Attention (IIA), and Vector Direction Difference Loss (VDDL)
arXiv Detail & Related papers (2024-01-24T01:37:27Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection [57.646582245834324]
We propose a simple yet effective deepfake detector called LSDA.
It is based on a idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary.
We show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.
arXiv Detail & Related papers (2023-11-19T09:41:10Z) - Impression-Informed Multi-Behavior Recommender System: A Hierarchical
Graph Attention Approach [4.03161352925235]
We introduce textbfHierarchical textbfMulti-behavior textbfGraph Attention textbfNetwork (HMGN)
This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors.
We register a notable performance boost of up to 64% in NDCG@100 metrics over conventional graph neural network methods.
arXiv Detail & Related papers (2023-09-06T17:09:43Z) - InsMapper: Exploring Inner-instance Information for Vectorized HD
Mapping [41.59891369655983]
InsMapper harnesses inner-instance information for vectorized high-definition mapping through transformers.
InsMapper surpasses the previous state-of-the-art method, demonstrating its effectiveness and generality.
arXiv Detail & Related papers (2023-08-16T17:58:28Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by
Visual-Semantic Fusion for Egocentric Action Anticipation [33.41226268323332]
Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions in the first-person view.
Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network.
We propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework.
arXiv Detail & Related papers (2023-07-08T06:49:54Z) - Online Map Vectorization for Autonomous Driving: A Rasterization
Perspective [58.71769343511168]
We introduce a newization-based evaluation metric, which has superior sensitivity and is better suited to real-world autonomous driving scenarios.
We also propose MapVR (Map Vectorization via Rasterization), a novel framework that applies differentiableization to preciseized outputs and then performs geometry-aware supervision on HD maps.
arXiv Detail & Related papers (2023-06-18T08:51:14Z) - Weakly Supervised Video Salient Object Detection via Point Supervision [18.952253968878356]
We propose a strong baseline model based on point supervision.
To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives.
We label two point-supervised datasets, P-DAVIS and P-DAVSOD, by relabeling the DAVIS and the DAVSOD dataset.
arXiv Detail & Related papers (2022-07-15T03:31:15Z) - VectorMapNet: End-to-end Vectorized HD Map Learning [18.451587680552464]
We introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet.
This pipeline can explicitly model the spatial relation between map elements and generate vectorized maps friendly to downstream autonomous driving tasks.
Experiments show that VectorMapNet achieve strong map learning performance on both nuScenes and Argo2 dataset.
arXiv Detail & Related papers (2022-06-17T17:57:13Z) - Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.
We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z) - Light Field Saliency Detection with Dual Local Graph Learning
andReciprocative Guidance [148.9832328803202]
We model the infor-mation fusion within focal stack via graph networks.
We build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
arXiv Detail & Related papers (2021-10-02T00:54:39Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.