AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
- URL: http://arxiv.org/abs/2503.08121v2
- Date: Mon, 17 Mar 2025 01:07:25 GMT
- Title: AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
- Authors: Huy Nguyen, Kien Nguyen, Akila Pemasiri, Feng Liu, Sridha Sridharan, Clinton Fookes,
- Abstract summary: We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID)<n>This dataset comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras.<n>We propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams.
- Score: 39.350429734981184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID) that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras. This dataset offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings. In addition, to address these challenges, we propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams: (1) an Adapted Temporal-Spatial Stream addressing motion pattern inconsistencies and facilitating temporal feature learning, (2) a Normalized Appearance Stream leveraging physics-informed techniques to tackle resolution and appearance changes, and (3) a Multi-Scale Attention Stream handling scale variations across drone altitudes. We integrate visual-semantic cues from all streams to form a robust, viewpoint-invariant whole-body representation. Extensive experiments demonstrate that AG-VPReID-Net outperforms state-of-the-art approaches on both our new dataset and existing video-based ReID benchmarks, showcasing its effectiveness and generalizability. Nevertheless, the performance gap observed on AG-VPReID across all methods underscores the dataset's challenging nature. The dataset, code and trained models are available at https://github.com/agvpreid25/AG-VPReID-Net.
Related papers
- SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification [61.753607285860944]
We propose a novel two-stage feature learning framework named SD-ReID for AG-ReID.
In the first stage, we train a simple ViT-based model to extract coarse-grained representations and controllable conditions.
In the second stage, we fine-tune the SD model to learn complementary representations guided by the controllable conditions.
arXiv Detail & Related papers (2025-04-13T12:44:50Z) - Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.
This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z) - Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach [37.53617620654204]
We construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID.
G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution.
arXiv Detail & Related papers (2024-08-14T12:29:49Z) - Object Re-identification via Spatial-temporal Fusion Networks and Causal Identity Matching [4.123763595394021]
We introduce a novel ReID framework that leverages a spatial-temporal fusion network and causal identity matching (CIM)
Our framework estimates camera network topology using a proposed adaptive Parzen window and combines appearance features with spatial-temporal cues within the fusion network.
This approach has demonstrated outstanding performance across several datasets, including VeRi776, Vehicle-3I, and Market-1501, achieving up to 99.70% rank-1 accuracy and 95.5% mAP.
arXiv Detail & Related papers (2024-08-10T13:50:43Z) - Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network [87.36616083812058]
view-decoupled transformer (VDT) is proposed as a simple yet effective framework for aerial-ground person re-identification.
Two major components are designed in VDT to decouple view-related and view-unrelated features.
In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images.
arXiv Detail & Related papers (2024-03-21T16:08:21Z) - AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification [39.58286453178339]
Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision.
We introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios.
This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels.
arXiv Detail & Related papers (2024-01-05T04:53:33Z) - Scalable Video Object Segmentation with Identification Mechanism [125.4229430216776]
This paper explores the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object (VOS)
We present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)
Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks.
arXiv Detail & Related papers (2022-03-22T03:33:27Z) - TransVOD: End-to-end Video Object Detection with Spatial-Temporal
Transformers [96.981282736404]
We present TransVOD, the first end-to-end video object detection system based on spatial-temporal Transformer architectures.
Our proposed TransVOD++ sets a new state-of-the-art record in terms of accuracy on ImageNet VID with 90.0% mAP.
Our proposed TransVOD Lite also achieves the best speed and accuracy trade-off with 83.7% mAP while running at around 30 FPS.
arXiv Detail & Related papers (2022-01-13T16:17:34Z) - Anchor Retouching via Model Interaction for Robust Object Detection in
Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator.
Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.