Related papers: Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach

Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach

URL: http://arxiv.org/abs/2601.03534v1
Date: Wed, 07 Jan 2026 02:46:51 GMT
Title: Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach
Authors: Yilong Dai, Ziyi Wang, Chenguang Wang, Kexin Zhou, Yiheng Qian, Susu Xu, Xiang Yan,
Abstract summary: This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment.<n>We developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists.<n>Experiment results show that the proposed framework offers competitive bikeability rating prediction.
Score: 8.652496663871172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bikeability assessment is essential for advancing sustainable urban transportation and creating cyclist-friendly cities, and it requires incorporating users' perceptions of safety and comfort. Yet existing perception-based bikeability assessment approaches face key limitations in capturing the complexity of road environments and adequately accounting for heterogeneity in subjective user perceptions. This paper proposes a persona-aware Vision-Language Model framework for bikeability assessment with three novel contributions: (i) theory-grounded persona conditioning based on established cyclist typology that generates persona-specific explanations via chain-of-thought reasoning; (ii) multi-granularity supervised fine-tuning that combines scarce expert-annotated reasoning with abundant user ratings for joint prediction and explainable assessment; and (iii) AI-enabled data augmentation that creates controlled paired data to isolate infrastructure variable impacts. To test and validate this framework, we developed a panoramic image-based crowdsourcing system and collected 12,400 persona-conditioned assessments from 427 cyclists. Experiment results show that the proposed framework offers competitive bikeability rating prediction while uniquely enabling explainable factor attribution.

Related papers

From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning? [3.437656066916039]
Vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks.<n>Existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint.<n>We introduce CyclingVQA, a diagnostic benchmark designed to probe perception,temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective.
arXiv Detail & Related papers (2026-02-11T12:01:37Z)
URBAN-SPIN: A street-level bikeability index to inform design implementations in historical city centres [2.770226625653906]
This study develops a perception-led, typology-based, and data-integrated framework.<n>It explicitly models street typologies and their sub-classifications to evaluate how visual and spatial configurations shape cycling experience.<n>The framework offers a transferable model for evaluating and improving cycling conditions in heritage cities.
arXiv Detail & Related papers (2026-01-30T23:22:11Z)
StreetDesignAI: A Multi-Persona Evaluation System for Inclusive Infrastructure Design [8.314136104243735]
We present StreetDesignAI, an interactive system that enables designers to ground evaluation in street context through imagery and map data.<n>A study with 26 transportation professionals demonstrates that structured multi-perspective feedback significantly improves designers' understanding of diverse user perspectives.
arXiv Detail & Related papers (2026-01-22T05:53:05Z)
DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models [24.168614747778538]
We introduce DriveCritic, a novel framework featuring two key contributions.<n>The dataset is a curated collection of challenging scenarios where context is critical for correct judgment.<n>The DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context.
arXiv Detail & Related papers (2025-10-15T03:00:38Z)
MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving [85.04826012938642]
MetAdv is a novel adversarial testing platform that enables realistic, dynamic, and interactive evaluation.<n>It supports flexible 3D vehicle modeling and seamless transitions between simulated and physical environments.<n>It enables real-time capture of physiological signals and behavioral feedback from drivers.
arXiv Detail & Related papers (2025-08-04T03:07:54Z)
Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics [0.0]
This study introduces a novel Multimodal Street Evaluation Framework (MSEF)<n>We fine-tune the framework using LoRA and P-Tuning v2 for parameter-efficient adaptation.<n>The model achieves an F1 score of 0.84 on objective features and 89.3 percent agreement with aggregated resident perceptions.
arXiv Detail & Related papers (2025-06-05T14:34:04Z)
Objective Bicycle Occlusion Level Classification using a Deformable Parts-Based Model [1.565361244756411]
Road safety is a critical challenge, particularly for cyclists, who are among the most vulnerable road users.<n>This study aims to enhance road safety by proposing a novel benchmark for bicycle occlusion level classification using advanced computer vision techniques.
arXiv Detail & Related papers (2025-05-21T10:42:41Z)
Which cycling environment appears safer? Learning cycling safety perceptions from pairwise image comparisons [2.3900828891729784]
Cycling is critical for cities to transition to more sustainable transport modes. Yet, safety concerns remain a critical deterrent for individuals to cycle.<n>In this study, we tackle the problem of capturing and understanding how individuals perceive cycling risk.<n>We base our approach on using pairwise comparisons of real-world images, repeatedly presenting respondents with pairs of road environments.<n>We ask them to select the one they perceive as safer for cycling, if any.<n>Using the collected data, we train a siamese-convolutional neural network using a multi-loss framework that learns from individuals' responses, learns preferences directly from images,
arXiv Detail & Related papers (2024-12-13T03:56:40Z)
Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations [48.924085579865334]
Analyzing human data is crucial for developing autonomous systems that replicate safe driving practices. This paper presents a comparative evaluation of human compliance with traffic and safety rules across multiple trajectory prediction datasets.
arXiv Detail & Related papers (2024-11-04T09:21:00Z)
Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach [54.84957282120537]
We present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes.<n>We benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest.<n>Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts.
arXiv Detail & Related papers (2024-10-11T04:53:18Z)
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.<n>We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.<n>Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z)
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping [84.65114565766596]
We present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
arXiv Detail & Related papers (2023-04-20T16:31:22Z)
Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers [126.81938540470847]
We propose Euro-PVI, a dataset of pedestrian and bicyclist trajectories. In this work, we develop a joint inference model that learns an expressive multi-modal shared latent space across agents in the urban scene. We achieve state of the art results on the nuScenes and Euro-PVI datasets demonstrating the importance of capturing interactions between ego-vehicle and pedestrians (bicyclists) for accurate predictions.
arXiv Detail & Related papers (2021-06-22T15:40:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.