SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
- URL: http://arxiv.org/abs/2504.09549v2
- Date: Thu, 30 Oct 2025 12:00:18 GMT
- Title: SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
- Authors: Yuhao Wang, Xiang Hu, Lixin Wang, Pingping Zhang, Huchuan Lu,
- Abstract summary: We propose a novel generative framework named SD-ReID for AG-ReID.<n>We first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions.<n>We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions.
- Score: 74.36139886192495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust model is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's ability to represent persons. To address these issues, we propose a novel generative framework named SD-ReID for AG-ReID, which leverages generative models to mimic the feature distribution of different views while extracting robust identity representations. More specifically, we first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions. We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions. Furthermore, we introduce the View-Refined Decoder (VRD) to bridge the gap between instance-level and global-level features. Finally, both person representations and all-view features are employed to retrieve target persons. Extensive experiments on five AG-ReID benchmarks (i.e., CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR and G2APS-ReID) demonstrate the effectiveness of our proposed method. The source code will be available.
Related papers
- DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer [21.788582116033684]
Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video.<n>Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency.<n>We propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping to the video domain.
arXiv Detail & Related papers (2026-01-04T08:07:11Z) - OmniPerson: Unified Identity-Preserving Pedestrian Generation [12.060261814704022]
We introduce OmniPerson, the first unified identity-preserving pedestrian generation pipeline for ReID tasks.<n>We present PersonSyn, the first large-scale dataset for multi-reference, controllable pedestrian generation.<n>We will open-source the full, pretrained model, and the PersonSyn dataset.
arXiv Detail & Related papers (2025-12-02T09:24:34Z) - ArbiViewGen: Controllable Arbitrary Viewpoint Camera Data Generation for Autonomous Driving via Stable Diffusion Models [8.314980817044958]
Arbiviewgen is a novel framework for the generation of controllable camera images from arbitrary points of view.<n>We introduce two key components: Feature-Aware Adaptive View Stitching and Cross-View Consistency Self-Supervised Learning.
arXiv Detail & Related papers (2025-08-07T10:24:47Z) - Attribute Guidance With Inherent Pseudo-label For Occluded Person Re-identification [16.586742421279137]
Attribute-Guide ReID (AG-ReID) is a novel framework to extract fine-grained semantic attributes without additional data or annotations.<n>Our framework operates through a two-stage process: first generating attribute pseudo-labels that capture subtle visual characteristics, then introducing a dual-guidance mechanism.<n>Extensive experiments demonstrate that AG-ReID achieves state-of-the-art results on multiple widely-used Re-ID datasets.
arXiv Detail & Related papers (2025-08-07T03:13:24Z) - Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification [2.552131151698595]
We proposed a novel self-supervision and supervision combining transformer-based person re-identification framework, namely SSSC-TransReID.
We designed a self-supervised contrastive learning branch, which can enhance the feature representation for person re-identification without negative samples or additional pre-training.
Our proposed model obtains superior Re-ID performance consistently and outperforms the state-of-the-art ReID methods by large margins on the mean average accuracy (mAP) and Rank-1 accuracy.
arXiv Detail & Related papers (2024-10-21T03:17:25Z) - PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification [73.64560354556498]
Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features.
We present PartFormer, an innovative adaptation of ViT designed to overcome the limitations in object Re-ID tasks.
Our framework significantly outperforms state-of-the-art by 2.4% mAP scores on the most challenging MSMT17 dataset.
arXiv Detail & Related papers (2024-08-29T16:31:05Z) - Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval [85.73149096516543]
We address the choice of viewpoint during sketch creation in Fine-Grained Sketch-Based Image Retrieval (FG-SBIR)
A pilot study highlights the system's struggle when query-sketches differ in viewpoint from target instances.
To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks.
arXiv Detail & Related papers (2024-07-01T21:20:44Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network [87.36616083812058]
view-decoupled transformer (VDT) is proposed as a simple yet effective framework for aerial-ground person re-identification.
Two major components are designed in VDT to decouple view-related and view-unrelated features.
In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images.
arXiv Detail & Related papers (2024-03-21T16:08:21Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Learning Invariance from Generated Variance for Unsupervised Person
Re-identification [15.096776375794356]
We propose to replace traditional data augmentation with a generative adversarial network (GAN)
A 3D mesh guided person image generator is proposed to disentangle a person image into id-related and id-unrelated features.
By jointly training the generative and the contrastive modules, our method achieves new state-of-the-art unsupervised person ReID performance on mainstream large-scale benchmarks.
arXiv Detail & Related papers (2023-01-02T15:40:14Z) - Camera-Conditioned Stable Feature Generation for Isolated Camera
Supervised Person Re-IDentification [24.63519986072777]
Cross-camera images could be unavailable under the ISolated Camera Supervised setting, e.g., a surveillance system deployed across distant scenes.
A new pipeline is introduced by synthesizing the cross-camera samples in the feature space for model training.
Experiments on two ISCS person Re-ID datasets demonstrate the superiority of our CCSFG to the competitors.
arXiv Detail & Related papers (2022-03-29T03:10:24Z) - Pose Invariant Person Re-Identification using Robust Pose-transformation
GAN [11.338815177557645]
Person re-identification (re-ID) aims to retrieve a person's images from an image gallery, given a single instance of the person of interest.
Despite several advancements, learning discriminative identity-sensitive and viewpoint invariant features for robust Person Re-identification is a major challenge owing to large pose variation of humans.
This paper proposes a re-ID pipeline that utilizes the image generation capability of Generative Adversarial Networks combined with pose regression and feature fusion to achieve pose invariant feature learning.
arXiv Detail & Related papers (2021-04-11T15:47:03Z) - Fine-Grained Re-Identification [1.8275108630751844]
This paper proposes a computationally efficient fine-grained ReID model, FGReID, which is among the first models to unify image and video ReID.
FGReID takes advantage of video-based pre-training and spatial feature attention to improve performance on both video and image ReID tasks.
arXiv Detail & Related papers (2020-11-26T21:04:17Z) - Cross-Resolution Adversarial Dual Network for Person Re-Identification
and Beyond [59.149653740463435]
Person re-identification (re-ID) aims at matching images of the same person across camera views.
Due to varying distances between cameras and persons of interest, resolution mismatch can be expected.
We propose a novel generative adversarial network to address cross-resolution person re-ID.
arXiv Detail & Related papers (2020-02-19T07:21:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.