Related papers: CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification

CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification

URL: http://arxiv.org/abs/2401.10011v1
Date: Thu, 18 Jan 2024 14:27:01 GMT
Title: CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification
Authors: Yanwei Zheng, Xinpeng Zhao, Chuanlin Lan, Xiaowei Zhang, Bowen Huang, Jibin Yang, Dongxiao Yu
Abstract summary: Weakly supervised text-based person re-identification (TPRe-ID) seeks to retrieve images of a target person using textual descriptions. The primary challenge is the intra-class differences, encompassing intra-modal feature variations and cross-modal semantic gaps. In practice, the CPCL introduces the CLIP model to weakly supervised TPRe-ID for the first time, mapping visual and textual instances into a shared latent space.
Score: 10.64115914599574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised text-based person re-identification (TPRe-ID) seeks to retrieve images of a target person using textual descriptions, without relying on identity annotations and is more challenging and practical. The primary challenge is the intra-class differences, encompassing intra-modal feature variations and cross-modal semantic gaps. Prior works have focused on instance-level samples and ignored prototypical features of each person which are intrinsic and invariant. Toward this, we propose a Cross-Modal Prototypical Contrastive Learning (CPCL) method. In practice, the CPCL introduces the CLIP model to weakly supervised TPRe-ID for the first time, mapping visual and textual instances into a shared latent space. Subsequently, the proposed Prototypical Multi-modal Memory (PMM) module captures associations between heterogeneous modalities of image-text pairs belonging to the same person through the Hybrid Cross-modal Matching (HCM) module in a many-to-many mapping fashion. Moreover, the Outlier Pseudo Label Mining (OPLM) module further distinguishes valuable outlier samples from each modality, enhancing the creation of more reliable clusters by mining implicit relationships between image-text pairs. Experimental results demonstrate that our proposed CPCL attains state-of-the-art performance on all three public datasets, with a significant improvement of 11.58%, 8.77% and 5.25% in Rank@1 accuracy on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively. The code is available at https://github.com/codeGallery24/CPCL.

Related papers

Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z)
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training [17.27516384073838]
We propose CMAL, a Cross-Modal Associative Learning framework with anchor points detection and cross-modal associative learning. CMAL achieves competitive performance against previous CMCL-based methods on four common downstream vision-and-language tasks.
arXiv Detail & Related papers (2024-10-16T14:12:26Z)
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification [30.983346937558743]
Key challenges in USL-VI-ReID are to effectively generate pseudo-labels and establish pseudo-label correspondences. We propose a Multi-Memory Matching framework for USL-VI-ReID. Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the reliability of the established cross-modality correspondences.
arXiv Detail & Related papers (2024-01-12T01:24:04Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS) Existing methods suffer from a granularity inconsistency regarding the usage of group tokens. We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z)
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID [56.573905143954015]
We propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:27:46Z)
End-to-End Context-Aided Unicity Matching for Person Re-identification [100.02321122258638]
We propose an end-to-end person unicity matching architecture for learning and refining the person matching relations. We use the samples' global context relationship to refine the soft matching results and reach the matching unicity through bipartite graph matching. Given full consideration to real-world person re-identification applications, we achieve the unicity matching in both one-shot and multi-shot settings.
arXiv Detail & Related papers (2022-10-20T07:33:57Z)
InsCon:Instance Consistency Feature Representation via Self-Supervised Learning [9.416267640069297]
We propose a new end-to-end self-supervised framework called InsCon, which is devoted to capturing multi-instance information. InsCon builds a targeted learning paradigm that applies multi-instance images as input, aligning the learned feature between corresponding instance views. On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation.
arXiv Detail & Related papers (2022-03-15T07:09:00Z)
CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images. One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships. We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression. Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities. We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.