CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification
- URL: http://arxiv.org/abs/2406.09198v1
- Date: Thu, 13 Jun 2024 14:56:07 GMT
- Title: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification
- Authors: Shuang Li, Jiaxu Leng, Guozhang Li, Ji Gan, Haosheng chen, Xinbo Gao,
- Abstract summary: We propose a novel framework called CLIP-Driven Cloth-Agnostic Feature Learning (CCAF) for Cloth-Changing Person Re-Identification (CC-ReID)
Two modules were custom-designed: the Invariant Feature Prompting (IFP) and the Clothes Feature Minimization (CFM)
Experiments have demonstrated the effectiveness of the proposed CCAF, achieving new state-of-the-art performance on several popular CC-ReID benchmarks without any additional inference time.
- Score: 47.948622774810296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive Language-Image Pre-Training (CLIP) has shown impressive performance in short-term Person Re-Identification (ReID) due to its ability to extract high-level semantic features of pedestrians, yet its direct application to Cloth-Changing Person Re-Identification (CC-ReID) faces challenges due to CLIP's image encoder overly focusing on clothes clues. To address this, we propose a novel framework called CLIP-Driven Cloth-Agnostic Feature Learning (CCAF) for CC-ReID. Accordingly, two modules were custom-designed: the Invariant Feature Prompting (IFP) and the Clothes Feature Minimization (CFM). These modules guide the model to extract cloth-agnostic features positively and attenuate clothes-related features negatively. Specifically, IFP is designed to extract fine-grained semantic features unrelated to clothes from the raw image, guided by the cloth-agnostic text prompts. This module first covers the clothes in the raw image at the pixel level to obtain the shielding image and then utilizes CLIP's knowledge to generate cloth-agnostic text prompts. Subsequently, it aligns the raw image-text and the raw image-shielding image in the feature space, emphasizing discriminative clues related to identity but unrelated to clothes. Furthermore, CFM is designed to examine and weaken the image encoder's ability to extract clothes features. It first generates text prompts corresponding to clothes pixels. Then, guided by these clothes text prompts, it iteratively examines and disentangles clothes features from pedestrian features, ultimately retaining inherent discriminative features. Extensive experiments have demonstrated the effectiveness of the proposed CCAF, achieving new state-of-the-art performance on several popular CC-ReID benchmarks without any additional inference time.
Related papers
- Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification [74.10897798660314]
Cloth-changing person Re-IDentification aims at recognizing the same person with clothing changes across non-overlapping cameras.
We propose the Content and Salient Semantics Collaboration framework, facilitating cross-parallel semantics interaction and refinement.
Our framework is simple yet effective, and the vital design is the Semantics Mining and Refinement (SMR) module.
arXiv Detail & Related papers (2024-05-26T15:17:28Z) - Identity-aware Dual-constraint Network for Cloth-Changing Person Re-identification [13.709863134725335]
Cloth-Changing Person Re-Identification (CC-ReID) aims to accurately identify the target person in more realistic surveillance scenarios, where pedestrians usually change their clothing.
Despite great progress, limited cloth-changing training samples in existing CC-ReID datasets still prevent the model from adequately learning cloth-irrelevant features.
We propose an Identity-aware Dual-constraint Network (IDNet) for the CC-ReID task.
arXiv Detail & Related papers (2024-03-13T05:46:36Z) - Masked Attribute Description Embedding for Cloth-Changing Person Re-identification [66.53045140286987]
Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods.
The key challenge in CC-ReID is to extract clothing-independent features, such as face, hairstyle, body shape, and gait.
We propose a Masked Attribute Description Embedding (MADE) method that unifies personal visual appearance and attribute description for CC-ReID.
arXiv Detail & Related papers (2024-01-11T03:47:13Z) - Semantic-aware Consistency Network for Cloth-changing Person
Re-Identification [8.885551377703944]
We present a Semantic-aware Consistency Network (SCNet) to learn identity-related semantic features.
We generate the black-clothing image by erasing pixels in the clothing area.
We further design a semantic consistency loss to facilitate the learning of high-level identity-related semantic features.
arXiv Detail & Related papers (2023-08-27T14:07:57Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Clothes-Invariant Feature Learning by Causal Intervention for
Clothes-Changing Person Re-identification [118.23912884472794]
Clothes-invariant feature extraction is critical to the clothes-changing person re-identification (CC-ReID)
We argue that there exists a strong spurious correlation between clothes and human identity, that restricts the common likelihood-based ReID method P(Y|X) to extract clothes-irrelevant features.
We propose a new Causal Clothes-Invariant Learning (CCIL) method to achieve clothes-invariant feature learning.
arXiv Detail & Related papers (2023-05-10T13:48:24Z) - CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification
without Concrete Text Labels [28.42405456691034]
We propose a two-stage strategy to facilitate a better visual representation in image re-identification tasks.
The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID.
The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks.
arXiv Detail & Related papers (2022-11-25T09:41:57Z) - A Semantic-aware Attention and Visual Shielding Network for
Cloth-changing Person Re-identification [29.026249268566303]
Cloth-changing person reidentification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed.
Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations.
This work proposes a novel semantic-aware attention and visual shielding network for cloth-changing person ReID.
arXiv Detail & Related papers (2022-07-18T05:38:37Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.