Learning Global Representation from Queries for Vectorized HD Map Construction
- URL: http://arxiv.org/abs/2510.06969v1
- Date: Wed, 08 Oct 2025 12:56:08 GMT
- Title: Learning Global Representation from Queries for Vectorized HD Map Construction
- Authors: Shoumeng Qiu, Xinrun Li, Yang Long, Xiangyang Xue, Varun Ojha, Jian Pu,
- Abstract summary: We propose textbfMapGR (textbfGlobal textbfRepresentation learning for HD textbfMap construction)<n>A Global Representation Learning (GRL) module encourages the distribution of all queries to better align with the global map.<n>A Global Representation Guidance (GRG) module endows each individual query with explicit, global-level contextual information to facilitate its optimization.
- Score: 37.400007014018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The online construction of vectorized high-definition (HD) maps is a cornerstone of modern autonomous driving systems. State-of-the-art approaches, particularly those based on the DETR framework, formulate this as an instance detection problem. However, their reliance on independent, learnable object queries results in a predominantly local query perspective, neglecting the inherent global representation within HD maps. In this work, we propose \textbf{MapGR} (\textbf{G}lobal \textbf{R}epresentation learning for HD \textbf{Map} construction), an architecture designed to learn and utilize a global representations from queries. Our method introduces two synergistic modules: a Global Representation Learning (GRL) module, which encourages the distribution of all queries to better align with the global map through a carefully designed holistic segmentation task, and a Global Representation Guidance (GRG) module, which endows each individual query with explicit, global-level contextual information to facilitate its optimization. Evaluations on the nuScenes and Argoverse2 datasets validate the efficacy of our approach, demonstrating substantial improvements in mean Average Precision (mAP) compared to leading baselines.
Related papers
- What matters for Representation Alignment: Global Information or Spatial Structure? [64.67092609921816]
Representation alignment (REPA) guides generative training by distilling representations from a strong, pretrained vision encoder to intermediate diffusion features.<n>We investigate a fundamental question: what aspect of the target representation matters for generation, its textitglobal revisionsemantic information.<n>We replace the standard projection layer in REPA with a simple convolution layer and introduce a spatial normalization layer for the external representation.
arXiv Detail & Related papers (2025-12-11T16:39:53Z) - Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference [1.6891753537675143]
We propose DGMap, a dual-decoding framework with global context awareness.<n>By integrating global semantic context with local geometric features, DGMap improves keypoint detection accuracy.<n>Global Context-aware Relation Prediction module suppresses false connections in dense-trajectory regions.
arXiv Detail & Related papers (2025-09-15T09:31:38Z) - InteractionMap: Improving Online Vectorized HDMap Construction with Interaction [0.4551615447454768]
State-of-the-art map vectorization methods are mainly based on DETR-like framework to generate HD maps in an end-to-end manner.<n>In this paper, we propose InteractionMap, which improves previous map vectorization methods by fully leveraging local-to-global information interaction.
arXiv Detail & Related papers (2025-03-27T16:23:15Z) - MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction [75.93907511203317]
We propose MGMapNet (Multi-Granularity Map Network) to model map element with a multi-granularity representation.
The proposed MGMapNet achieves state-of-the-art performance, surpassing MapTRv2 by 5.3 mAP on nuScenes and 4.4 mAP on Argoverse2 respectively.
arXiv Detail & Related papers (2024-10-10T09:05:23Z) - Jointly Learning Representations for Map Entities via Heterogeneous
Graph Contrastive Learning [38.415692986360995]
We propose a novel method named HOME-GCL for learning representations of multiple categories of map entities.
Our approach utilizes a heterogeneous map entity graph (HOME graph) that integrates both road segments and land parcels into a unified framework.
To the best of our knowledge, HOME-GCL is the first attempt to jointly learn representations for road segments and land parcels using a unified model.
arXiv Detail & Related papers (2024-02-09T01:47:18Z) - Bilateral Reference for High-Resolution Dichotomous Image Segmentation [109.35828258964557]
We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)<n>It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef)<n>Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference.
arXiv Detail & Related papers (2024-01-07T07:56:47Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Double Graphs Regularized Multi-view Subspace Clustering [15.52467509308717]
We propose a novel Double Graphs Regularized Multi-view Subspace Clustering (DGRMSC) method.
It aims to harness both global and local structural information of multi-view data in a unified framework.
arXiv Detail & Related papers (2022-09-30T00:16:42Z) - Grounding Visual Representations with Texts for Domain Generalization [9.554646174100123]
Cross-modality supervision can be successfully used to ground domain-invariant visual representations.
Our proposed method achieves state-of-the-art results and ranks 1st in average performance for five multi-domain datasets.
arXiv Detail & Related papers (2022-07-21T03:43:38Z) - Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph
Representation Learning [6.128446481571702]
Unsupervised/self-supervised graph representation learning is critical for downstream node- and graph-level classification tasks.
We introduce instance-adaptive global-aware ego-semantic descriptors.
The descriptors can be explicitly integrated into local graph convolution as new neighbor nodes.
arXiv Detail & Related papers (2022-05-31T12:31:33Z) - Fully Self-Supervised Learning for Semantic Segmentation [46.6602159197283]
We present a fully self-supervised framework for semantic segmentation(FS4).
We propose a bootstrapped training scheme for semantic segmentation, which fully leveraged the global semantic knowledge for self-supervision.
We evaluate our method on the large-scale COCO-Stuff dataset and achieved 7.19 mIoU improvements on both things and stuff objects.
arXiv Detail & Related papers (2022-02-24T09:38:22Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.