Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
- URL: http://arxiv.org/abs/2505.00619v1
- Date: Thu, 01 May 2025 15:55:38 GMT
- Title: Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
- Authors: Neng Dong, Shuanglin Yan, Liyan Zhang, Jinhui Tang,
- Abstract summary: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images.<n>We propose a novel Diverse Semantics-guided Feature Alignment and Decoupling (DSFAD) network to align identity-relevant features from different modalities into a textual embedding space.
- Score: 31.011118085494942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-Infrared Person Re-Identification (VI-ReID) is a challenging task due to the large modality discrepancy between visible and infrared images, which complicates the alignment of their features into a suitable common space. Moreover, style noise, such as illumination and color contrast, reduces the identity discriminability and modality invariance of features. To address these challenges, we propose a novel Diverse Semantics-guided Feature Alignment and Decoupling (DSFAD) network to align identity-relevant features from different modalities into a textual embedding space and disentangle identity-irrelevant features within each modality. Specifically, we develop a Diverse Semantics-guided Feature Alignment (DSFA) module, which generates pedestrian descriptions with diverse sentence structures to guide the cross-modality alignment of visual features. Furthermore, to filter out style information, we propose a Semantic Margin-guided Feature Decoupling (SMFD) module, which decomposes visual features into pedestrian-related and style-related components, and then constrains the similarity between the former and the textual embeddings to be at least a margin higher than that between the latter and the textual embeddings. Additionally, to prevent the loss of pedestrian semantics during feature decoupling, we design a Semantic Consistency-guided Feature Restitution (SCFR) module, which further excavates useful information for identification from the style-related features and restores it back into the pedestrian-related features, and then constrains the similarity between the features after restitution and the textual embeddings to be consistent with that between the features before decoupling and the textual embeddings. Extensive experiments on three VI-ReID datasets demonstrate the superiority of our DSFAD.
Related papers
- Embedding and Enriching Explicit Semantics for Visible-Infrared Person Re-Identification [31.011118085494942]
Visible-infrared person re-identification (VIReID) retrieves pedestrian images with the same identity across different modalities.
Existing methods learn visual content solely from images, lacking the capability to sense high-level semantics.
We propose an Embedding and Enriching Explicit Semantics framework to learn semantically rich cross-modality pedestrian representations.
arXiv Detail & Related papers (2024-12-11T14:27:30Z) - See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification [16.845045499676793]
Cloth-changing person re-identification (CC-ReID) aims to match individuals across multiple surveillance cameras despite variations in clothing.<n>Existing methods typically focus on mitigating the effects of clothing changes or enhancing ID-relevant features.<n>We propose a novel prompt learning framework, Semantic Contextual Integration (SCI), for CC-ReID.
arXiv Detail & Related papers (2024-12-02T10:11:16Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - CLIP-Driven Semantic Discovery Network for Visible-Infrared Person
Re-Identification [39.262536758248245]
Cross-modality identity matching poses significant challenges in VIReID.
We propose a CLIP-Driven Semantic Discovery Network (CSDN) that consists of Modality-specific Prompt Learner, Semantic Information Integration, and High-level Semantic Embedding.
arXiv Detail & Related papers (2024-01-11T10:20:13Z) - Shape-centered Representation Learning for Visible-Infrared Person Re-identification [49.929146653650534]
Visible-Infrared Person Re-Identification (VI-ReID) plays a critical role in all-day surveillance systems.<n>Existing methods primarily focus on learning appearance features while overlooking body shape features.<n>We propose the Shape-centered Representation Learning (ScRL) framework, which enhances VI-ReID performance by innovatively integrating shape and appearance features.
arXiv Detail & Related papers (2023-10-27T07:57:24Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Shape-Erased Feature Learning for Visible-Infrared Person
Re-Identification [90.39454748065558]
Body shape is one of the significant modality-shared cues for VI-ReID.
We propose shape-erased feature learning paradigm that decorrelates modality-shared features in two subspaces.
Experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-04-09T10:22:10Z) - CycleTrans: Learning Neutral yet Discriminative Features for
Visible-Infrared Person Re-Identification [79.84912525821255]
Visible-infrared person re-identification (VI-ReID) is a task of matching the same individuals across the visible and infrared modalities.
Existing VI-ReID methods mainly focus on learning general features across modalities, often at the expense of feature discriminability.
We present a novel cycle-construction-based network for neutral yet discriminative feature learning, termed CycleTrans.
arXiv Detail & Related papers (2022-08-21T08:41:40Z) - Learning Semantic-Aligned Feature Representation for Text-based Person
Search [8.56017285139081]
We propose a semantic-aligned embedding method for text-based person search.
The feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features.
Experimental results on the CUHK-PEDES and Flickr30K datasets show that our method achieves state-of-the-art performances.
arXiv Detail & Related papers (2021-12-13T14:54:38Z) - Exploring Modality-shared Appearance Features and Modality-invariant
Relation Features for Cross-modality Person Re-Identification [72.95858515157603]
Cross-modality person re-identification works rely on discriminative modality-shared features.
Despite some initial success, such modality-shared appearance features cannot capture enough modality-invariant information.
A novel cross-modality quadruplet loss is proposed to further reduce the cross-modality variations.
arXiv Detail & Related papers (2021-04-23T11:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.