Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
- URL: http://arxiv.org/abs/2408.07500v2
- Date: Tue, 3 Sep 2024 02:50:56 GMT
- Title: Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
- Authors: Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang,
- Abstract summary: We construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID.
G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution.
- Score: 37.53617620654204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset.
Related papers
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation [22.782099757385804]
TIP-I2V is the first large-scale dataset of user-provided text and image prompts for image-to-video generation.
We provide the corresponding generated videos from five state-of-the-art image-to-video models.
arXiv Detail & Related papers (2024-11-05T18:52:43Z) - Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization [115.64739269488965]
VimTS enhances the generalization ability of the model by achieving better synergy among different tasks.
We propose a synthetic video text dataset (VTD-368k) by leveraging the Content Deformation Fields (CoDeF) algorithm.
For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2.
arXiv Detail & Related papers (2024-04-30T15:49:03Z) - DAM: Dynamic Adapter Merging for Continual Video QA Learning [66.43360542692355]
We present a parameter-efficient method for continual video question-answering (VidQA) learning.
Our method uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, and (iv) enable knowledge sharing across similar dataset domains.
Our DAM model outperforms prior state-of-the-art continual learning approaches by 9.1% while exhibiting 1.9% less forgetting on 6 VidQA datasets spanning various domains.
arXiv Detail & Related papers (2024-03-13T17:53:47Z) - AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification [39.58286453178339]
Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision.
We introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios.
This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels.
arXiv Detail & Related papers (2024-01-05T04:53:33Z) - Ground-to-Aerial Person Search: Benchmark Dataset and Approach [42.54151390290665]
We construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS.
G2APS contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras.
arXiv Detail & Related papers (2023-08-24T11:11:26Z) - NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth.
We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner.
We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.