Related papers: Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping

Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping

URL: http://arxiv.org/abs/2510.10660v1
Date: Sun, 12 Oct 2025 15:33:45 GMT
Title: Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
Authors: Hao Shan, Ruikai Li, Han Jiang, Yizhe Fan, Ziyang Yan, Bohan Li, Xiaoshuai Hao, Hao Zhao, Zhiyong Cui, Yilong Ren, Haiyang Yu,
Abstract summary: This paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models.<n>We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability.<n>Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems.
Score: 25.516502412129096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As one of the fundamental modules in autonomous driving, online high-definition (HD) maps have attracted significant attention due to their cost-effectiveness and real-time capabilities. Since vehicles always cruise in highly dynamic environments, spatial displacement of onboard sensors inevitably causes shifts in real-time HD mapping results, and such instability poses fundamental challenges for downstream tasks. However, existing online map construction models tend to prioritize improving each frame's mapping accuracy, while the mapping stability has not yet been systematically studied. To fill this gap, this paper presents the first comprehensive benchmark for evaluating the temporal stability of online HD mapping models. We propose a multi-dimensional stability evaluation framework with novel metrics for Presence, Localization, and Shape Stability, integrated into a unified mean Average Stability (mAS) score. Extensive experiments on 42 models and variants show that accuracy (mAP) and stability (mAS) represent largely independent performance dimensions. We further analyze the impact of key model design choices on both criteria, identifying architectural and training factors that contribute to high accuracy, high stability, or both. To encourage broader focus on stability, we will release a public benchmark. Our work highlights the importance of treating temporal stability as a core evaluation criterion alongside accuracy, advancing the development of more reliable autonomous driving systems. The benchmark toolkit, code, and models will be available at https://stablehdmap.github.io/.

Related papers

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models [4.66260462241022]
LILAD is a novel framework for system identification that jointly guarantees stability and adaptability.<n>We evaluate LILAD on benchmark autonomous systems and demonstrate that it outperforms adaptive, robust, and non-adaptive baselines in predictive accuracy.
arXiv Detail & Related papers (2025-11-26T19:20:49Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Prompt Stability in Code LLMs: Measuring Sensitivity across Emotion- and Personality-Driven Variations [40.12950482269347]
We present PromptSE, a framework that creates semantically equivalent prompt variants with emotion and personality templates.<n>Our study shows that performance and stability behave as largely decoupled optimization objectives.<n>PromptSE enables practitioners to quantify performance stability trade offs for deployment and model selection.
arXiv Detail & Related papers (2025-09-17T04:17:42Z)
RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z)
Prompt Stability Matters: Evaluating and Optimizing Auto-Generated Prompt in General-Purpose Systems [19.59294293070619]
We introduce semantic stability as a criterion for assessing the response consistency of model responses.<n>We develop the first stability-aware general-purpose prompt generation system.<n>Our work offers a new perspective on prompt design and contributes practical tools for building more trustworthy general-purpose systems.
arXiv Detail & Related papers (2025-05-19T03:28:33Z)
Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges [53.2306792009435]
This paper introduces a novel framework for detecting instability in smart grids using only stable data.<n>It achieves up to 98.1% accuracy in predicting grid stability and 98.9% in detecting adversarial attacks.<n>Implemented on a single-board computer, it enables real-time decision-making with an average response time of under 7ms.
arXiv Detail & Related papers (2025-01-27T20:48:25Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving [1.4104119587524289]
Recent advancements in autonomous driving have seen a paradigm shift towards end-to-end learning paradigms. These models often sacrifice interpretability, posing significant challenges to trust, safety, and regulatory compliance. We introduce DRIVE, a comprehensive framework designed to improve the dependability and stability of explanations in end-to-end unsupervised driving models.
arXiv Detail & Related papers (2024-09-16T14:40:47Z)
Towards Stable 3D Object Detection [64.49059005467817]
Stability Index (SI) is a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading. To help models improve their stability, we introduce a general and effective training strategy, called Prediction Consistency Learning (PCL) PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability.
arXiv Detail & Related papers (2024-07-05T07:17:58Z)
Model Stability with Continuous Data Updates [2.439909645714735]
We study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems. We find that model design choices, including network architecture and input representation, have a critical impact on stability. We recommend ML model designers account for trade-offs in accuracy and jitter when making modeling choices.
arXiv Detail & Related papers (2022-01-14T22:11:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.