AG-VPReID.VIR: Bridging Aerial and Ground Platforms for Video-based Visible-Infrared Person Re-ID
- URL: http://arxiv.org/abs/2507.17995v1
- Date: Thu, 24 Jul 2025 00:13:25 GMT
- Title: AG-VPReID.VIR: Bridging Aerial and Ground Platforms for Video-based Visible-Infrared Person Re-ID
- Authors: Huy Nguyen, Kien Nguyen, Akila Pemasiri, Akmal Jahan, Clinton Fookes, Sridha Sridharan,
- Abstract summary: We present AG-VPReID.VIR, the first aerial-ground cross-modality video-based person Re-ID dataset.<n>This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV-mounted and fixed CCTV cameras in RGB and infrared modalities.<n>Our approach bridges the domain gaps between aerial-ground perspectives and RGB-IR modalities, through style-robust feature learning, memory-based cross-view adaptation, and intermediary-guided temporal modeling.
- Score: 36.00219379027019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person re-identification (Re-ID) across visible and infrared modalities is crucial for 24-hour surveillance systems, but existing datasets primarily focus on ground-level perspectives. While ground-based IR systems offer nighttime capabilities, they suffer from occlusions, limited coverage, and vulnerability to obstructions--problems that aerial perspectives uniquely solve. To address these limitations, we introduce AG-VPReID.VIR, the first aerial-ground cross-modality video-based person Re-ID dataset. This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV-mounted and fixed CCTV cameras in RGB and infrared modalities. AG-VPReID.VIR presents unique challenges including cross-viewpoint variations, modality discrepancies, and temporal dynamics. Additionally, we propose TCC-VPReID, a novel three-stream architecture designed to address the joint challenges of cross-platform and cross-modality person Re-ID. Our approach bridges the domain gaps between aerial-ground perspectives and RGB-IR modalities, through style-robust feature learning, memory-based cross-view adaptation, and intermediary-guided temporal modeling. Experiments show that AG-VPReID.VIR presents distinctive challenges compared to existing datasets, with our TCC-VPReID framework achieving significant performance gains across multiple evaluation protocols. Dataset and code are available at https://github.com/agvpreid25/AG-VPReID.VIR.
Related papers
- Cross-Spectral Body Recognition with Side Information Embedding: Benchmarks on LLCM and Analyzing Range-Induced Occlusions on IJB-MDF [51.36007967653781]
Vision Transformers (ViTs) have demonstrated impressive performance across a wide range of biometric tasks, including face and body recognition.<n>In this work, we adapt a ViT model pretrained on visible (VIS) imagery to the challenging problem of cross-spectral body recognition.<n>Building on this idea, we integrate Side Information Embedding (SIE) and examine the impact of encoding domain and camera information to enhance cross-spectral matching.<n>Surprisingly, our results show that encoding only camera information - without explicitly incorporating domain information - achieves state-of-the-art performance on the LLCM dataset.
arXiv Detail & Related papers (2025-06-10T16:20:52Z) - SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification [61.753607285860944]
We propose a novel two-stage feature learning framework named SD-ReID for AG-ReID.<n>In the first stage, we train a simple ViT-based model to extract coarse-grained representations and controllable conditions.<n>In the second stage, we fine-tune the SD model to learn complementary representations guided by the controllable conditions.
arXiv Detail & Related papers (2025-04-13T12:44:50Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z) - AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification [39.350429734981184]
We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID)<n>This dataset comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras.<n>We propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams.
arXiv Detail & Related papers (2025-03-11T07:38:01Z) - View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network [87.36616083812058]
view-decoupled transformer (VDT) is proposed as a simple yet effective framework for aerial-ground person re-identification.
Two major components are designed in VDT to decouple view-related and view-unrelated features.
In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images.
arXiv Detail & Related papers (2024-03-21T16:08:21Z) - AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification [39.58286453178339]
Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision.
We introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios.
This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels.
arXiv Detail & Related papers (2024-01-05T04:53:33Z) - Visible-Infrared Person Re-Identification Using Privileged Intermediate
Information [10.816003787786766]
Cross-modal person re-identification (ReID) is challenging due to the large domain shift in data distributions between RGB and IR modalities.
This paper introduces a novel approach for a creating intermediate virtual domain that acts as bridges between the two main domains.
We devised a new method to generate images between visible and infrared domains that provide additional information to train a deep ReID model.
arXiv Detail & Related papers (2022-09-19T21:08:14Z) - Learning Modal-Invariant and Temporal-Memory for Video-based
Visible-Infrared Person Re-Identification [46.49866514866999]
We primarily study the video-based cross-modal person Re-ID method.
We prove that with the increase of frames in a tracklet, the performance does meet more enhancement.
A novel method is proposed, which projects two modalities to a modal-invariant subspace.
arXiv Detail & Related papers (2022-08-04T04:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.