See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification
- URL: http://arxiv.org/abs/2412.01345v2
- Date: Sun, 18 May 2025 09:18:08 GMT
- Title: See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification
- Authors: Xiyu Han, Xian Zhong, Wenxin Huang, Xuemei Jia, Xiaohan Yu, Alex Chichung Kot,
- Abstract summary: Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing.<n>Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features.<n>We propose a novel prompt learning framework Semantic Contextual Integration (SCI) to reduce clothing-induced discrepancies and strengthen ID cues.
- Score: 14.01260112340177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.
Related papers
- ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation [33.84646269805187]
ID-EA is a novel framework that guides text embeddings to align with visual identity embeddings.<n> ID-EA substantially outperforms state-of-the-art methods in identity preservation metrics.<n>It generates personalized portraits 15 times faster than existing approaches.
arXiv Detail & Related papers (2025-07-16T07:42:02Z) - Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification [31.011118085494942]
Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images.
We propose a novel Diverse Semantics-guided Feature Alignment and Decoupling (DSFAD) network to align identity-relevant features from different modalities into a textual embedding space.
arXiv Detail & Related papers (2025-05-01T15:55:38Z) - LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID.
It adopts prompt-tuning strategies to leverage attribute-based text knowledge.
Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z) - DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID [15.204652332980672]
Clothes-changing person re-identification (CC-ReID) aims to recognize individuals under different clothing scenarios.<n>Current CC-ReID approaches either concentrate on modeling body shape using additional modalities including silhouette, pose, and body mesh.<n>We propose DIFFER: Disentangle Identity Features From Entangled Representations, a novel adversarial learning method that leverages textual descriptions to disentangle identity features.
arXiv Detail & Related papers (2025-03-28T23:40:59Z) - CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification [47.948622774810296]
We propose a novel framework called CLIP-Driven Cloth-Agnostic Feature Learning (CCAF) for Cloth-Changing Person Re-Identification (CC-ReID)
Two modules were custom-designed: the Invariant Feature Prompting (IFP) and the Clothes Feature Minimization (CFM)
Experiments have demonstrated the effectiveness of the proposed CCAF, achieving new state-of-the-art performance on several popular CC-ReID benchmarks without any additional inference time.
arXiv Detail & Related papers (2024-06-13T14:56:07Z) - Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning [114.59476118365266]
We propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment.
AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification [74.10897798660314]
Cloth-changing person Re-IDentification aims at recognizing the same person with clothing changes across non-overlapping cameras.
We propose the Content and Salient Semantics Collaboration framework, facilitating cross-parallel semantics interaction and refinement.
Our framework is simple yet effective, and the vital design is the Semantics Mining and Refinement (SMR) module.
arXiv Detail & Related papers (2024-05-26T15:17:28Z) - Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning [56.65891462413187]
We propose a progressive semantic-guided vision transformer for zero-shot learning (dubbed ZSLViT)
ZSLViT first introduces semantic-embedded token learning to improve the visual-semantic correspondences via semantic enhancement.
Then, we fuse low semantic-visual correspondence visual tokens to discard the semantic-unrelated visual information for visual enhancement.
arXiv Detail & Related papers (2024-04-11T12:59:38Z) - CLIP-Driven Semantic Discovery Network for Visible-Infrared Person
Re-Identification [39.262536758248245]
Cross-modality identity matching poses significant challenges in VIReID.
We propose a CLIP-Driven Semantic Discovery Network (CSDN) that consists of Modality-specific Prompt Learner, Semantic Information Integration, and High-level Semantic Embedding.
arXiv Detail & Related papers (2024-01-11T10:20:13Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Shape-Erased Feature Learning for Visible-Infrared Person
Re-Identification [90.39454748065558]
Body shape is one of the significant modality-shared cues for VI-ReID.
We propose shape-erased feature learning paradigm that decorrelates modality-shared features in two subspaces.
Experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-04-09T10:22:10Z) - A Semantic-aware Attention and Visual Shielding Network for
Cloth-changing Person Re-identification [29.026249268566303]
Cloth-changing person reidentification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed.
Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations.
This work proposes a novel semantic-aware attention and visual shielding network for cloth-changing person ReID.
arXiv Detail & Related papers (2022-07-18T05:38:37Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.