FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models
- URL: http://arxiv.org/abs/2507.23325v1
- Date: Thu, 31 Jul 2025 08:12:56 GMT
- Title: FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models
- Authors: Yiming Yang, Hongbin Lin, Yueru Luo, Suzhong Fu, Chao Zheng, Xinrui Yan, Shuqi Mei, Kun Tang, Shuguang Cui, Zhen Li,
- Abstract summary: Lane segment reasoning provides comprehensive bird's-eye view (BEV) road scene understanding.<n>Stream-based temporal propagation method has demonstrated promising results by incorporating temporal cues at both the query and BEV levels.<n>We propose FASTopoWM, a novel fast-slow lane segment reasoning framework augmented with latent world models.
- Score: 53.91899980806139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lane segment topology reasoning provides comprehensive bird's-eye view (BEV) road scene understanding, which can serve as a key perception module in planning-oriented end-to-end autonomous driving systems. Existing lane topology reasoning methods often fall short in effectively leveraging temporal information to enhance detection and reasoning performance. Recently, stream-based temporal propagation method has demonstrated promising results by incorporating temporal cues at both the query and BEV levels. However, it remains limited by over-reliance on historical queries, vulnerability to pose estimation failures, and insufficient temporal propagation. To overcome these limitations, we propose FASTopoWM, a novel fast-slow lane segment topology reasoning framework augmented with latent world models. To reduce the impact of pose estimation failures, this unified framework enables parallel supervision of both historical and newly initialized queries, facilitating mutual reinforcement between the fast and slow systems. Furthermore, we introduce latent query and BEV world models conditioned on the action latent to propagate the state representations from past observations to the current timestep. This design substantially improves the performance of temporal perception within the slow pipeline. Extensive experiments on the OpenLane-V2 benchmark demonstrate that FASTopoWM outperforms state-of-the-art methods in both lane segment detection (37.4% v.s. 33.6% on mAP) and centerline perception (46.3% v.s. 41.5% on OLS).
Related papers
- Occupancy Learning with Spatiotemporal Memory [39.41175479685905]
We propose a scene-level occupancy representation learning framework that effectively learns 3D occupancy feature with temporal consistency.<n>Our method significantly enhances thetemporal representation learned for 3D occupancy prediction tasks by exploiting the temporal dependency between multi-frame inputs.
arXiv Detail & Related papers (2025-08-06T17:59:52Z) - An Information-Theoretic Analysis for Federated Learning under Concept Drift [8.343774282372337]
This paper analyzes performance under concept drift using information theory and proposes an algorithm to mitigate the performance degradation.<n>We study three drift patterns (periodic, gradual, and random) and their impact on FL performance.<n>Inspired by this, we propose an algorithm that regularizes the empirical risk minimization approach with KL divergence and mutual information, thereby enhancing long-term performance.
arXiv Detail & Related papers (2025-06-26T06:25:15Z) - Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction [62.69089767730514]
We present GDFusion, a temporal fusion method for vision-based 3D semantic occupancy prediction (VisionOcc)<n>It opens up the underexplored aspects of temporal fusion within the VisionOcc framework, focusing on both temporal cues and fusion strategies.
arXiv Detail & Related papers (2025-04-17T14:05:33Z) - Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation [15.726401007342087]
Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving.<n>Recent approaches have incorporated historical data to mitigate the issue, but they often incur high computational costs and may introduce noisy information that interferes with object detection.<n>We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance.<n>Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and produces correction occupancy components to refine the base network's predictions.
arXiv Detail & Related papers (2025-02-21T13:07:45Z) - ResFlow: Fine-tuning Residual Optical Flow for Event-based High Temporal Resolution Motion Estimation [50.80115710105251]
Event cameras hold significant promise for high-temporal-resolution (HTR) motion estimation.<n>We propose a residual-based paradigm for estimating HTR optical flow with event data.
arXiv Detail & Related papers (2024-12-12T09:35:47Z) - Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework [4.773547922851949]
Traffic is a challenging-temporal forecasting problem that involves highly complex semantic correlations.
This paper proposes a Multi-level Multi-view Augmented-temporal Transformer (LVST) for traffic prediction.
arXiv Detail & Related papers (2024-06-17T07:36:57Z) - USTEP: Spatio-Temporal Predictive Learning under A Unified View [62.58464029270846]
We introduce USTEP (Unified S-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales.
arXiv Detail & Related papers (2023-10-09T16:17:42Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Physics-informed Tensor-train ConvLSTM for Volumetric Velocity
Forecasting of Loop Current [6.016102212809306]
Loop Current is a weekly forecast of velocity, vertical structure, and duration of the Loop Current (LC) in the Gulf of Mexico.
This paper shows its effectiveness beyond video prediction, to a novel Physics-informed spatial-train ConvLSTM for temporal sequences of 3D geospatial data forecasting.
arXiv Detail & Related papers (2020-08-04T19:55:57Z) - Supporting Optimal Phase Space Reconstructions Using Neural Network
Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties.
Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.