Related papers: MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

URL: http://arxiv.org/abs/2511.10218v2
Date: Mon, 17 Nov 2025 01:59:07 GMT
Title: MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion
Authors: Haolong Xiang, Peisi Wang, Xiaolong Xu, Kun Yi, Xuyun Zhang, Quanzheng Sheng, Amin Beheshti, Wei Fan,
Abstract summary: We propose a novel Multimodal framework, MTP, for urban Traffic Profiling.<n>It learns multimodal features through numeric, visual, and textual perspectives.<n>Experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.
Score: 26.948053856623243
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic signal modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives. The three branches drive for a multimodal perspective of urban traffic signal learning in the frequency domain, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic signals, which transforms the original modality into frequency images and periodicity images for visual learning. Also, we augment descriptive texts for the traffic signals based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the spectrum of three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.

Related papers

Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework [62.47416496137193]
We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop.<n>The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
arXiv Detail & Related papers (2025-03-06T07:36:06Z)
Multi-Source Urban Traffic Flow Forecasting with Drone and Loop Detector Data [61.9426776237409]
Drone-captured data can create an accurate multi-sensor mobility observatory for large-scale urban networks.<n>A simple yet effective graph-based model HiMSNet is proposed to integrate multiple data modalities and learn-temporal correlations.
arXiv Detail & Related papers (2025-01-07T03:23:28Z)
SSMT: Few-Shot Traffic Forecasting with Single Source Meta-Transfer [19.768107394061374]
We introduce Single Source Meta-Transfer Learning (SSMT) which relies only on a single source city for traffic prediction. Our method harnesses this transferred knowledge to enable few-shot traffic forecasting, particularly when the target city possesses limited data. We extend the idea of sinusoidal positional encoding to establish meta-learning tasks by leveraging diverse temporal traffic patterns from the source city.
arXiv Detail & Related papers (2024-10-21T02:17:25Z)
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition. By using description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z)
TrafficGPT: Towards Multi-Scale Traffic Analysis and Generation with Spatial-Temporal Agent Framework [3.947797359736224]
We have designed a multi-scale traffic generation system, TrafficGPT, using three AI agents to process multi-scale traffic data. TrafficGPT consists of three essential AI agents: 1) a text-to-demand agent to interact with users and extract prediction tasks through texts; 2) a traffic prediction agent that leverages multi-scale traffic data to generate temporal features and similarity; and 3) a suggestion and visualization agent that uses the prediction results to generate suggestions and visualizations.
arXiv Detail & Related papers (2024-05-08T07:48:40Z)
BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation. We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z)
Traffic Reconstruction and Analysis of Natural Driving Behaviors at Unsignalized Intersections [1.7273380623090846]
This research involved recording traffic at various unsignalized intersections in Memphis, TN, during different times of the day. After manually labeling video data to capture specific variables, we reconstructed traffic scenarios in the SUMO simulation environment. The output data from these simulations offered a comprehensive analysis, including time-space diagrams for vehicle movement, travel time frequency distributions, and speed-position plots to identify bottleneck points.
arXiv Detail & Related papers (2023-12-22T09:38:06Z)
TL-GAN: Improving Traffic Light Recognition via Data Synthesis for Autonomous Driving [8.474436072102844]
We propose a novel traffic light generation approach TL-GAN to synthesize the data of rare classes to improve traffic light recognition for autonomous driving. In the image synthesis stage, our approach enables conditional generation to allow full control of the color of the generated traffic light images. In the sequence assembling stage, we design the style mixing and adaptive template to synthesize realistic and diverse traffic light sequences.
arXiv Detail & Related papers (2022-03-28T18:12:35Z)
3D Scene Understanding at Urban Intersection using Stereo Vision and Digital Map [4.640144833676576]
We introduce a stereo vision and 3D digital map based approach to spatially and temporally analyze the traffic situation at urban intersections. We qualitatively and quantitatively evaluate our proposed technique on real traffic data collected at an urban canyon in Tokyo to demonstrate the efficacy of the system.
arXiv Detail & Related papers (2021-12-10T02:05:15Z)
Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet) CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z)
Road Network Guided Fine-Grained Urban Traffic Flow Inference [108.64631590347352]
Accurate inference of fine-grained traffic flow from coarse-grained one is an emerging yet crucial problem. We propose a novel Road-Aware Traffic Flow Magnifier (RATFM) that exploits the prior knowledge of road networks. Our method can generate high-quality fine-grained traffic flow maps.
arXiv Detail & Related papers (2021-09-29T07:51:49Z)
Traffic-Net: 3D Traffic Monitoring Using a Single Camera [1.1602089225841632]
We provide a practical platform for real-time traffic monitoring using a single CCTV traffic camera. We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm. We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream.
arXiv Detail & Related papers (2021-09-19T16:59:01Z)
An Experimental Urban Case Study with Various Data Sources and a Model for Traffic Estimation [65.28133251370055]
We organize an experimental campaign with video measurement in an area within the urban network of Zurich, Switzerland. We focus on capturing the traffic state in terms of traffic flow and travel times by ensuring measurements from established thermal cameras. We propose a simple yet efficient Multiple Linear Regression (MLR) model to estimate travel times with fusion of various data sources.
arXiv Detail & Related papers (2021-08-02T08:13:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.