Scaling Traffic Insights with AI and Language Model-Powered Camera Systems for Data-Driven Transportation Decision Making
- URL: http://arxiv.org/abs/2510.09981v1
- Date: Sat, 11 Oct 2025 03:18:42 GMT
- Title: Scaling Traffic Insights with AI and Language Model-Powered Camera Systems for Data-Driven Transportation Decision Making
- Authors: Fan Zuo, Donglin Zhou, Jingqin Gao, Kaan Ozbay,
- Abstract summary: This study presents an end-to-end AI-based framework for high-resolution, longitudinal analysis at scale.<n>A fine-tuned YOLOv11 model, trained on localized urban scenes, extracts multimodal traffic density and classification metrics in real time.<n>We validated the system using over 9 million images from roughly 1,000 traffic cameras during the early rollout of NYC congestion pricing in 2025.
- Score: 3.0273878903284266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate, scalable traffic monitoring is critical for real-time and long-term transportation management, particularly during disruptions such as natural disasters, large construction projects, or major policy changes like New York City's first-in-the-nation congestion pricing program. However, widespread sensor deployment remains limited due to high installation, maintenance, and data management costs. While traffic cameras offer a cost-effective alternative, existing video analytics struggle with dynamic camera viewpoints and massive data volumes from large camera networks. This study presents an end-to-end AI-based framework leveraging existing traffic camera infrastructure for high-resolution, longitudinal analysis at scale. A fine-tuned YOLOv11 model, trained on localized urban scenes, extracts multimodal traffic density and classification metrics in real time. To address inconsistencies from non-stationary pan-tilt-zoom cameras, we introduce a novel graph-based viewpoint normalization method. A domain-specific large language model was also integrated to process massive data from a 24/7 video stream to generate frequent, automated summaries of evolving traffic patterns, a task far exceeding manual capabilities. We validated the system using over 9 million images from roughly 1,000 traffic cameras during the early rollout of NYC congestion pricing in 2025. Results show a 9% decline in weekday passenger vehicle density within the Congestion Relief Zone, early truck volume reductions with signs of rebound, and consistent increases in pedestrian and cyclist activity at corridor and zonal scales. Experiments showed that example-based prompts improved LLM's numerical accuracy and reduced hallucinations. These findings demonstrate the framework's potential as a practical, infrastructure-ready solution for large-scale, policy-relevant traffic monitoring with minimal human intervention.
Related papers
- Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks [5.862522659881676]
Real-time urban traffic surveillance is vital for Intelligent Transportation Systems (ITS) to ensure road safety, optimize traffic flow, track vehicle trajectories, and prevent collisions in smart cities.<n>We propose a semantic communication framework that significantly reduces transmission overhead.<n>This approach achieves a 99.9% reduction in data transmission size while maintaining an LLM response accuracy of 89% for reconstructed cropped images, compared to 93% accuracy with original cropped images.
arXiv Detail & Related papers (2025-09-25T14:53:36Z) - Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume [4.188237759092441]
We introduce the Graph Neural Network for Urban Interpolation (GNNUI), a novel urban traffic volume estimation approach.<n>GNNUI employs a masking algorithm to learn, integrates node features to capture functional roles, and uses a loss function tailored to zero-inflated traffic distributions.<n>In addition to the model, we introduce two new open-scale urban traffic volume benchmarks, covering different transportation modes.
arXiv Detail & Related papers (2025-05-07T13:34:00Z) - Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring [6.648291808015463]
This research leverages the LLaVA visual grounding multimodal large language model (LLM) for traffic monitoring tasks on the real-time Quanser Interactive Lab simulation platform.<n>Cameras placed at multiple urban locations collect real-time images from the simulation, which are fed into the LLaVA model with queries for analysis.<n>The system achieves 84.3% accuracy in recognizing vehicle locations and 76.4% in determining steering direction, outperforming traditional models.
arXiv Detail & Related papers (2025-02-16T23:03:26Z) - Learning Traffic Anomalies from Generative Models on Real-Time Observations [49.1574468325115]
We use the Spatiotemporal Generative Adversarial Network (STGAN) framework to capture complex spatial and temporal dependencies in traffic data.<n>We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020.<n>Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates.
arXiv Detail & Related papers (2025-02-03T14:23:23Z) - Multi-Source Urban Traffic Flow Forecasting with Drone and Loop Detector Data [61.9426776237409]
Drone-captured data can create an accurate multi-sensor mobility observatory for large-scale urban networks.<n>A simple yet effective graph-based model HiMSNet is proposed to integrate multiple data modalities and learn-temporal correlations.
arXiv Detail & Related papers (2025-01-07T03:23:28Z) - Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations.
The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes.
We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem.
We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks.
Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z) - Traffic-Net: 3D Traffic Monitoring Using a Single Camera [1.1602089225841632]
We provide a practical platform for real-time traffic monitoring using a single CCTV traffic camera.
We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm.
We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream.
arXiv Detail & Related papers (2021-09-19T16:59:01Z) - An Experimental Urban Case Study with Various Data Sources and a Model
for Traffic Estimation [65.28133251370055]
We organize an experimental campaign with video measurement in an area within the urban network of Zurich, Switzerland.
We focus on capturing the traffic state in terms of traffic flow and travel times by ensuring measurements from established thermal cameras.
We propose a simple yet efficient Multiple Linear Regression (MLR) model to estimate travel times with fusion of various data sources.
arXiv Detail & Related papers (2021-08-02T08:13:57Z) - Unsupervised Vehicle Counting via Multiple Camera Domain Adaptation [9.730985797769764]
Monitoring vehicle flows in cities is crucial to improve the urban environment and quality of life of citizens.
Current technologies for vehicle counting in images hinge on large quantities of annotated data, preventing their scalability to city-scale as new cameras are added to the system.
We propose and discuss a new methodology to design image-based vehicle density estimators with few labeled data via multiple camera domain adaptations.
arXiv Detail & Related papers (2020-04-20T13:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.