Related papers: SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking

SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking

URL: http://arxiv.org/abs/2512.20975v1
Date: Wed, 24 Dec 2025 06:04:58 GMT
Title: SPOT!: Map-Guided LLM Agent for Unsupervised Multi-CCTV Dynamic Object Tracking
Authors: Yujin Noh, Inho Jake Park, Chigon Hwang,
Abstract summary: This paper proposes SPOT (Spatial Prediction Over Trajectories), a map-guided LLM agent capable of tracking vehicles even in blind spots of multi-CCTV environments without prior training.<n>It transforms the vehicle's position into the actual world coordinate system using the relative position and FOV information of objects observed in CCTV images.<n> Experimental results based on the CARLA simulator in a virtual city environment confirmed that the proposed method accurately predicts the next appearing CCTV even in blind spot sections.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: CCTV-based vehicle tracking systems face structural limitations in continuously connecting the trajectories of the same vehicle across multiple camera environments. In particular, blind spots occur due to the intervals between CCTVs and limited Fields of View (FOV), which leads to object ID switching and trajectory loss, thereby reducing the reliability of real-time path prediction. This paper proposes SPOT (Spatial Prediction Over Trajectories), a map-guided LLM agent capable of tracking vehicles even in blind spots of multi-CCTV environments without prior training. The proposed method represents road structures (Waypoints) and CCTV placement information as documents based on 2D spatial coordinates and organizes them through chunking techniques to enable real-time querying and inference. Furthermore, it transforms the vehicle's position into the actual world coordinate system using the relative position and FOV information of objects observed in CCTV images. By combining map spatial information with the vehicle's moving direction, speed, and driving patterns, a beam search is performed at the intersection level to derive candidate CCTV locations where the vehicle is most likely to enter after the blind spot. Experimental results based on the CARLA simulator in a virtual city environment confirmed that the proposed method accurately predicts the next appearing CCTV even in blind spot sections, maintaining continuous vehicle trajectories more effectively than existing techniques.

Related papers

Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework [62.47416496137193]
We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop.<n>The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
arXiv Detail & Related papers (2025-03-06T07:36:06Z)
Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment. Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field. We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z)
Application of 2D Homography for High Resolution Traffic Data Collection using CCTV Cameras [9.946460710450319]
This study implements a three-stage video analytics framework for extracting high-resolution traffic data from CCTV cameras. The key components of the framework include object recognition, perspective transformation, and vehicle trajectory reconstruction. The results of the study showed about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for speed bias between camera estimates.
arXiv Detail & Related papers (2024-01-14T07:33:14Z)
MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles [58.461077944514564]
This paper presents MSight, a cutting-edge roadside perception system specifically designed for automated vehicles. MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction. Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency.
arXiv Detail & Related papers (2023-10-08T21:32:30Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Real-Time Accident Detection in Traffic Surveillance Using Deep Learning [0.8808993671472349]
This paper presents a new efficient framework for accident detection at intersections for traffic surveillance applications. The proposed framework consists of three hierarchical steps, including efficient and accurate object detection based on the state-of-the-art YOLOv4 method. The robustness of the proposed framework is evaluated using video sequences collected from YouTube with diverse illumination conditions.
arXiv Detail & Related papers (2022-08-12T19:07:20Z)
Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams. Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z)
Traffic-Net: 3D Traffic Monitoring Using a Single Camera [1.1602089225841632]
We provide a practical platform for real-time traffic monitoring using a single CCTV traffic camera. We adapt a custom YOLOv5 deep neural network model for vehicle/pedestrian detection and an enhanced SORT tracking algorithm. We also develop a hierarchical traffic modelling solution based on short- and long-term temporal video data stream.
arXiv Detail & Related papers (2021-09-19T16:59:01Z)
An overcome of far-distance limitation on tunnel CCTV-based accident detection in AI deep-learning frameworks [0.0]
Tunnel CCTVs are installed to low height and long-distance interval. It is almost impossible to detect vehicles in far distance from the CCTV in the existing tunnel CCTV-based accident detection system. This paper creates each dataset consisting of images and bounding boxes based on the original and warped images of the CCTV.
arXiv Detail & Related papers (2021-07-22T10:42:25Z)
Online Clustering-based Multi-Camera Vehicle Tracking in Scenarios with overlapping FOVs [2.6365690297272617]
Multi-Target Multi-Camera (MTMC) vehicle tracking is an essential task of visual traffic monitoring. We present a new low-latency online approach for MTMC tracking in scenarios with partially overlapping fields of view.
arXiv Detail & Related papers (2021-02-08T09:55:55Z)
Road Curb Detection and Localization with Monocular Forward-view Vehicle Camera [74.45649274085447]
We propose a robust method for estimating road curb 3D parameters using a calibrated monocular camera equipped with a fisheye lens. Our approach is able to estimate the vehicle to curb distance in real time with mean accuracy of more than 90%.
arXiv Detail & Related papers (2020-02-28T00:24:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.