Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach
- URL: http://arxiv.org/abs/2601.03534v1
- Date: Wed, 07 Jan 2026 02:46:51 GMT
- Title: Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach
- Authors: Yilong Dai, Ziyi Wang, Chenguang Wang, Kexin Zhou, Yiheng Qian, Susu Xu, Xiang Yan,
- Abstract summary: This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment.<n>We developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists.<n>Experiment results show that the proposed framework offers competitive bikeability rating prediction.
- Score: 8.652496663871172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bikeability assessment is essential for advancing sustainable urban transportation and creating cyclist-friendly cities, and it requires incorporating users' perceptions of safety and comfort. Yet existing perception-based bikeability assessment approaches face key limitations in capturing the complexity of road environments and adequately accounting for heterogeneity in subjective user perceptions. This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment with three novel contributions: (i) theory-grounded persona conditioning based on established cyclist typology that generates persona-specific explanations via chain-of-thought reasoning; (ii) multi-granularity supervised fine-tuning that combines scarce expert-annotated reasoning with abundant user ratings for joint prediction and explainable assessment; and (iii) AI-enabled data augmentation that creates controlled paired data to isolate infrastructure variable impacts. To test and validate this framework, we developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists. Experiment results show that the proposed framework offers competitive bikeability rating prediction while uniquely enabling explainable factor attribution.
Related papers
- From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning? [3.437656066916039]
Vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks.<n>Existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint.<n>We introduce CyclingVQA, a diagnostic benchmark designed to probe perception,temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective.
arXiv Detail & Related papers (2026-02-11T12:01:37Z) - URBAN-SPIN: A street-level bikeability index to inform design implementations in historical city centres [2.770226625653906]
This study develops a perception-led, typology-based, and data-integrated framework.<n>It explicitly models street typologies and their sub-classifications to evaluate how visual and spatial configurations shape cycling experience.<n>The framework offers a transferable model for evaluating and improving cycling conditions in heritage cities.
arXiv Detail & Related papers (2026-01-30T23:22:11Z) - StreetDesignAI: A Multi-Persona Evaluation System for Inclusive Infrastructure Design [8.314136104243735]
We present StreetDesignAI, an interactive system that enables designers to ground evaluation in street context through imagery and map data.<n>A study with 26 transportation professionals demonstrates that structured multi-perspective feedback significantly improves designers' understanding of diverse user perspectives.
arXiv Detail & Related papers (2026-01-22T05:53:05Z) - DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models [24.168614747778538]
We introduce DriveCritic, a novel framework featuring two key contributions.<n>The dataset is a curated collection of challenging scenarios where context is critical for correct judgment.<n>The DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context.
arXiv Detail & Related papers (2025-10-15T03:00:38Z) - MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving [85.04826012938642]
MetAdv is a novel adversarial testing platform that enables realistic, dynamic, and interactive evaluation.<n>It supports flexible 3D vehicle modeling and seamless transitions between simulated and physical environments.<n>It enables real-time capture of physiological signals and behavioral feedback from drivers.
arXiv Detail & Related papers (2025-08-04T03:07:54Z) - Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics [0.0]
This study introduces a novel Multimodal Street Evaluation Framework (MSEF)<n>We fine-tune the framework using LoRA and P-Tuning v2 for parameter-efficient adaptation.<n>The model achieves an F1 score of 0.84 on objective features and 89.3 percent agreement with aggregated resident perceptions.
arXiv Detail & Related papers (2025-06-05T14:34:04Z) - Objective Bicycle Occlusion Level Classification using a Deformable Parts-Based Model [1.565361244756411]
Road safety is a critical challenge, particularly for cyclists, who are among the most vulnerable road users.<n>This study aims to enhance road safety by proposing a novel benchmark for bicycle occlusion level classification using advanced computer vision techniques.
arXiv Detail & Related papers (2025-05-21T10:42:41Z) - Which cycling environment appears safer? Learning cycling safety perceptions from pairwise image comparisons [2.3900828891729784]
Cycling is critical for cities to transition to more sustainable transport modes. Yet, safety concerns remain a critical deterrent for individuals to cycle.<n>In this study, we tackle the problem of capturing and understanding how individuals perceive cycling risk.<n>We base our approach on using pairwise comparisons of real-world images, repeatedly presenting respondents with pairs of road environments.<n>We ask them to select the one they perceive as safer for cycling, if any.<n>Using the collected data, we train a siamese-convolutional neural network using a multi-loss framework that learns from individuals' responses, learns preferences directly from images,
arXiv Detail & Related papers (2024-12-13T03:56:40Z) - Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices.
This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z) - Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach [54.84957282120537]
We present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes.<n>We benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest.<n>Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts.
arXiv Detail & Related papers (2024-10-11T04:53:18Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.<n>We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.<n>Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure.
OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes.
We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z) - Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers [126.81938540470847]
We propose Euro-PVI, a dataset of pedestrian and bicyclist trajectories.
In this work, we develop a joint inference model that learns an expressive multi-modal shared latent space across agents in the urban scene.
We achieve state of the art results on the nuScenes and Euro-PVI datasets demonstrating the importance of capturing interactions between ego-vehicle and pedestrians (bicyclists) for accurate predictions.
arXiv Detail & Related papers (2021-06-22T15:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.