Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework
- URL: http://arxiv.org/abs/2503.04170v1
- Date: Thu, 06 Mar 2025 07:36:06 GMT
- Title: Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework
- Authors: Xiaolong Li, Jianhao Wei, Haidong Wang, Li Dong, Ruoyang Chen, Changyan Yi, Jun Cai, Dusit Niyato, Xuemin, Shen,
- Abstract summary: We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop.<n>The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
- Score: 62.47416496137193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In intelligent transportation systems (ITSs), incorporating pedestrians and vehicles in-the-loop is crucial for developing realistic and safe traffic management solutions. However, there is falls short of simulating complex real-world ITS scenarios, primarily due to the lack of a digital twin implementation framework for characterizing interactions between pedestrians and vehicles at different locations in different traffic environments. In this article, we propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop. Specifically, SVFDT builds comprehensive pedestrian-vehicle interaction models by leveraging multi-source traffic surveillance videos. Its architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime. We analyze key design requirements and challenges and present core guidelines for SVFDT's system implementation. A testbed evaluation demonstrates its effectiveness in optimizing traffic management. Comparisons with traditional terminal-server frameworks highlight SV-FDT's advantages in mirroring delays, recognition accuracy, and subjective evaluation. Finally, we identify some open challenges and discuss future research directions.
Related papers
- Salient Object Detection in Traffic Scene through the TSOD10K Dataset [22.615252113004402]
Traffic Salient Object Detection (TSOD) aims to segment the objects critical to driving safety by combining semantic (e.g., collision risks) and visual saliency.
Our research establishes the first foundation for safety-aware saliency analysis in intelligent transportation systems.
arXiv Detail & Related papers (2025-03-21T07:21:24Z) - Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything [7.452533291998081]
We propose a multi-agent trajectory prediction framework at signalized intersections dedicated to Infrastructure-to-Everything (I2XTraj)<n>Our framework leverages dynamic graph attention to integrate knowledge from traffic signals and driving behaviors.<n>Our approach outperforms existing methods by more than 30% in both multi-agent and single-agent scenarios.
arXiv Detail & Related papers (2025-01-23T08:23:45Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding [18.490299712769538]
We introduce the WTS dataset, highlighting detailed behaviors of both vehicles and pedestrians across over 1.2k video events in hundreds of traffic scenarios.
WTS integrates diverse perspectives from vehicle ego and fixed overhead cameras in a vehicle-infrastructure cooperative environment.
We also pro-vide annotations for 5k publicly sourced pedestrian-related traffic videos.
arXiv Detail & Related papers (2024-07-22T03:29:22Z) - AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception [26.84439405241999]
We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle.
AIDE facilitates holistic driver monitoring through three distinctive characteristics.
Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
arXiv Detail & Related papers (2023-07-26T03:12:05Z) - Generative AI-empowered Simulation for Autonomous Driving in Vehicular
Mixed Reality Metaverses [130.15554653948897]
In vehicular mixed reality (MR) Metaverse, distance between physical and virtual entities can be overcome.
Large-scale traffic and driving simulation via realistic data collection and fusion from the physical world is difficult and costly.
We propose an autonomous driving architecture, where generative AI is leveraged to synthesize unlimited conditioned traffic and driving data in simulations.
arXiv Detail & Related papers (2023-02-16T16:54:10Z) - Federated Deep Learning Meets Autonomous Vehicle Perception: Design and
Verification [168.67190934250868]
Federated learning empowered connected autonomous vehicle (FLCAV) has been proposed.
FLCAV preserves privacy while reducing communication and annotation costs.
It is challenging to determine the network resources and road sensor poses for multi-stage training.
arXiv Detail & Related papers (2022-06-03T23:55:45Z) - Traffic-Net: 3D Traffic Monitoring Using a Single Camera [1.1602089225841632]
We provide a practical platform for real-time traffic monitoring using a single CCTV traffic camera.
We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm.
We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream.
arXiv Detail & Related papers (2021-09-19T16:59:01Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.