Related papers: From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

URL: http://arxiv.org/abs/2602.10771v1
Date: Wed, 11 Feb 2026 12:01:37 GMT
Title: From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?
Authors: Krishna Kanth Nakka, Vedasri Nakka,
Abstract summary: Vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks.<n>Existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint.<n>We introduce CyclingVQA, a diagnostic benchmark designed to probe perception,temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective.
Score: 3.437656066916039
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.

Related papers

HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic [49.31491001465465]
HetroD is a dataset and benchmark for developing autonomous driving systems in heterogeneous environments.<n>HetroD targets the critical challenge of navi- gating real-world heterogeneous traffic dominated by vulner- able road users (VRUs)
arXiv Detail & Related papers (2026-02-03T12:12:47Z)
Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach [8.652496663871172]
This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment.<n>We developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists.<n>Experiment results show that the proposed framework offers competitive bikeability rating prediction.
arXiv Detail & Related papers (2026-01-07T02:46:51Z)
Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving [1.2891210250935148]
We present a hierarchical Safe Reinforcement Learning framework that augments standard driving objectives with ethics-aware cost signals.<n>A Safe RL agent is trained using a composite ethical risk cost, combining collision probability and harm severity, to generate high-level motion targets.<n>A dynamic, risk-sensitive Prioritized Experience mechanism amplifies learning from rare but critical, high-risk events.
arXiv Detail & Related papers (2025-08-19T14:24:02Z)
Explaining Autonomous Vehicles with Intention-aware Policy Graphs [0.1398098625978622]
We propose a model-agnostic solution to provide teleological explanations for the behaviour of an autonomous vehicle in urban environments.<n>Building on Intention-aware Policy Graphs, our approach enables the extraction of interpretable and reliable explanations of vehicle behaviour.<n>We demonstrate the potential of these explanations to assess whether the vehicle operates within acceptable legal boundaries and to identify possible vulnerabilities in autonomous driving datasets and models.
arXiv Detail & Related papers (2025-05-13T09:58:32Z)
Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving [55.96227460521096]
Vision-Language Models (VLMs) have been integrated into autonomous driving systems to enhance reasoning capabilities.<n>We propose a natural reflection-based backdoor attack targeting VLM systems in autonomous driving scenarios.<n>Our findings uncover a new class of attacks that exploit the stringent real-time requirements of autonomous driving.
arXiv Detail & Related papers (2025-05-09T20:28:17Z)
Exploring the Causality of End-to-End Autonomous Driving [57.631400236930375]
We propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving. Our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one.
arXiv Detail & Related papers (2024-07-09T04:56:11Z)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.<n>To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z)
Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics [77.34726150561087]
This work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles. Concepts and characteristics related to both sensors, as well as to their fusion, are presented. We give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.
arXiv Detail & Related papers (2023-03-08T00:48:32Z)
FBLNet: FeedBack Loop Network for Driver Attention Prediction [50.936478241688114]
Nonobjective driving experience is difficult to model, so a mechanism simulating the driver experience accumulation procedure is absent in existing methods.<n>We propose a FeedBack Loop Network (FBLNet), which attempts to model the driving experience accumulation procedure.<n>Our model exhibits a solid advantage over existing methods, achieving an outstanding performance improvement on two driver attention benchmark datasets.
arXiv Detail & Related papers (2022-12-05T08:25:09Z)
CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based Autonomous Urban Driving [43.269130988225605]
Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors. We present a novel CAscade Deep REinforcement learning framework, CADRE, to achieve model-free vision-based autonomous urban driving.
arXiv Detail & Related papers (2022-02-17T10:07:16Z)
Probabilistic End-to-End Vehicle Navigation in Complex Dynamic Environments with Multimodal Sensor Fusion [16.018962965273495]
All-day and all-weather navigation is a critical capability for autonomous driving. We propose a probabilistic driving model with ultiperception capability utilizing the information from the camera, lidar and radar. The results suggest that our proposed model outperforms baselines and achieves excellent generalization performance in unseen environments.
arXiv Detail & Related papers (2020-05-05T03:48:10Z)
Decoding pedestrian and automated vehicle interactions using immersive virtual reality and interpretable deep learning [6.982614422666432]
This study investigates pedestrian crossing behaviour, as an important element of urban dynamics that is expected to be affected by the presence of automated vehicles. Pedestrian wait time behaviour is then analyzed using a data-driven Cox Proportional Hazards (CPH) model. Results show that the presence of automated vehicles on roads, wider lane widths, high density on roads, limited sight distance, and lack of walking habits are the main contributing factors to longer wait times.
arXiv Detail & Related papers (2020-02-18T01:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.