Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage
- URL: http://arxiv.org/abs/2507.05229v1
- Date: Mon, 07 Jul 2025 17:39:11 GMT
- Title: Self-Supervised Real-Time Tracking of Military Vehicles in Low-FPS UAV Footage
- Authors: Markiyan Kostiv, Anatolii Adamovskyi, Yevhen Cherniavskyi, Mykyta Varenyk, Ostap Viniavskyi, Igor Krashenyi, Oles Dobosevych,
- Abstract summary: Associating objects in low-frame-rate videos captured by moving unmanned aerial vehicles (UAVs) in actual combat scenarios is complex.<n>We present how instance association learning from single-frame annotations can overcome these challenges.
- Score: 1.4957552011362911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-object tracking (MOT) aims to maintain consistent identities of objects across video frames. Associating objects in low-frame-rate videos captured by moving unmanned aerial vehicles (UAVs) in actual combat scenarios is complex due to rapid changes in object appearance and position within the frame. The task becomes even more challenging due to image degradation caused by cloud video streaming and compression algorithms. We present how instance association learning from single-frame annotations can overcome these challenges. We show that global features of the scene provide crucial context for low-FPS instance association, allowing our solution to be robust to distractors and gaps in detections. We also demonstrate that such a tracking approach maintains high association quality even when reducing the input image resolution and latent representation size for faster inference. Finally, we present a benchmark dataset of annotated military vehicles collected from publicly available data sources. This paper was initially presented at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST)Scientific and Technical Committee, IST-209-RSY - the ICMCIS, held in Oeiras, Portugal, 13-14 May 2025.
Related papers
- Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z) - Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos [11.532574301455854]
We propose a highly effective strategy for multi-frame video object detection.<n>Our method improves robustness, especially for lightweight models.<n>We contribute the BOAT360 benchmark dataset to support future research in multi-frame video object detection in challenging real-world scenarios.
arXiv Detail & Related papers (2025-06-25T15:49:07Z) - Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking [5.746443489229576]
Key Frame Extraction (KFE) module leverages reinforcement learning to adaptively segment videos.<n> Intra-Frame Feature Fusion (IFF) module uses a Graph Convolutional Network (GCN) to facilitate information exchange between the target and surrounding objects.<n>Our proposed tracker achieves impressive results on the MOT17 dataset.
arXiv Detail & Related papers (2025-01-17T11:36:38Z) - Vision-Based Detection of Uncooperative Targets and Components on Small Satellites [6.999319023465766]
Space debris and inactive satellites pose a threat to the safety and integrity of operational spacecraft.
Recent advancements in computer vision models can be used to improve upon existing methods for tracking such uncooperative targets.
This paper introduces an autonomous detection model designed to identify and monitor these objects using learning and computer vision.
arXiv Detail & Related papers (2024-08-22T02:48:13Z) - Analysis of Unstructured High-Density Crowded Scenes for Crowd Monitoring [55.2480439325792]
We are interested in developing an automated system for detection of organized movements in human crowds.<n>Computer vision algorithms can extract information from videos of crowded scenes.<n>We can estimate the number of participants in an organized cohort.
arXiv Detail & Related papers (2024-08-06T22:09:50Z) - View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV [43.37259596065606]
We address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios.
Changes in the scene background not only render traditional frame-to-frame object IOU association methods ineffective but also introduce significant view shifts in the objects.
We propose a novel universal HomView-MOT framework, which for the first time harnesses the view Homography inherent in changing scenes to solve MOT challenges.
arXiv Detail & Related papers (2024-03-16T06:48:33Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Dense Video Object Captioning from Disjoint Supervision [77.47084982558101]
We propose a new task and model for dense video object captioning.
This task unifies spatial and temporal localization in video.
We show how our model improves upon a number of strong baselines for this new task.
arXiv Detail & Related papers (2023-06-20T17:57:23Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - AIM 2020 Challenge on Video Temporal Super-Resolution [118.46127362093135]
Second AIM challenge on Video Temporal Super-Resolution (VTSR)
This paper reports the second AIM challenge on Video Temporal Super-Resolution (VTSR)
arXiv Detail & Related papers (2020-09-28T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.