SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
- URL: http://arxiv.org/abs/2504.09549v1
- Date: Sun, 13 Apr 2025 12:44:50 GMT
- Title: SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
- Authors: Xiang Hu, Pingping Zhang, Yuhao Wang, Bin Yan, Huchuan Lu,
- Abstract summary: We propose a novel two-stage feature learning framework named SD-ReID for AG-ReID.<n>In the first stage, we train a simple ViT-based model to extract coarse-grained representations and controllable conditions.<n>In the second stage, we fine-tune the SD model to learn complementary representations guided by the controllable conditions.
- Score: 61.753607285860944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative ReID models to maintain identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust network is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's capability to represent persons. To address these issues, we propose a novel two-stage feature learning framework named SD-ReID for AG-ReID, which takes advantage of the powerful understanding capacity of generative models, e.g., Stable Diffusion (SD), to generate view-specific features between different viewpoints. In the first stage, we train a simple ViT-based model to extract coarse-grained representations and controllable conditions. Then, in the second stage, we fine-tune the SD model to learn complementary representations guided by the controllable conditions. Furthermore, we propose the View-Refine Decoder (VRD) to obtain additional controllable conditions to generate missing cross-view features. Finally, we use the coarse-grained representations and all-view features generated by SD to retrieve target persons. Extensive experiments on the AG-ReID benchmarks demonstrate the effectiveness of our proposed SD-ReID. The source code will be available upon acceptance.
Related papers
- Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification [2.552131151698595]
We proposed a novel self-supervision and supervision combining transformer-based person re-identification framework, namely SSSC-TransReID.
We designed a self-supervised contrastive learning branch, which can enhance the feature representation for person re-identification without negative samples or additional pre-training.
Our proposed model obtains superior Re-ID performance consistently and outperforms the state-of-the-art ReID methods by large margins on the mean average accuracy (mAP) and Rank-1 accuracy.
arXiv Detail & Related papers (2024-10-21T03:17:25Z) - PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification [73.64560354556498]
Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features.
We present PartFormer, an innovative adaptation of ViT designed to overcome the limitations in object Re-ID tasks.
Our framework significantly outperforms state-of-the-art by 2.4% mAP scores on the most challenging MSMT17 dataset.
arXiv Detail & Related papers (2024-08-29T16:31:05Z) - Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval [85.73149096516543]
We address the choice of viewpoint during sketch creation in Fine-Grained Sketch-Based Image Retrieval (FG-SBIR)
A pilot study highlights the system's struggle when query-sketches differ in viewpoint from target instances.
To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks.
arXiv Detail & Related papers (2024-07-01T21:20:44Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Learning Invariance from Generated Variance for Unsupervised Person
Re-identification [15.096776375794356]
We propose to replace traditional data augmentation with a generative adversarial network (GAN)
A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features.
By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.
arXiv Detail & Related papers (2023-01-02T15:40:14Z) - Camera-Conditioned Stable Feature Generation for Isolated Camera
Supervised Person Re-IDentification [24.63519986072777]
Cross-camera images could be unavailable under the ISolated Camera Supervised setting, e.g., a surveillance system deployed across distant scenes.
A new pipeline is introduced by synthesizing the cross-camera samples in the feature space for model training.
Experiments on two ISCS person Re-ID datasets demonstrate the superiority of our CCSFG to the competitors.
arXiv Detail & Related papers (2022-03-29T03:10:24Z) - Fine-Grained Re-Identification [1.8275108630751844]
This paper proposes a computationally efficient fine-grained ReID model, FGReID, which is among the first models to unify image and video ReID.
FGReID takes advantage of video-based pre-training and spatial feature attention to improve performance on both video and image ReID tasks.
arXiv Detail & Related papers (2020-11-26T21:04:17Z) - Cross-Resolution Adversarial Dual Network for Person Re-Identification
and Beyond [59.149653740463435]
Person re-identification (re-ID) aims at matching images of the same person across camera views.
Due to varying distances between cameras and persons of interest, resolution mismatch can be expected.
We propose a novel generative adversarial network to address cross-resolution person re-ID.
arXiv Detail & Related papers (2020-02-19T07:21:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.