DualFocus: Integrating Plausible Descriptions in Text-based Person Re-identification
- URL: http://arxiv.org/abs/2405.07459v2
- Date: Fri, 16 Aug 2024 10:53:01 GMT
- Title: DualFocus: Integrating Plausible Descriptions in Text-based Person Re-identification
- Authors: Yuchuan Deng, Zhanpeng Hu, Jiakun Han, Chuang Deng, Qijun Zhao,
- Abstract summary: We introduce DualFocus, a unified framework that integrates plausible descriptions to enhance the interpretative accuracy of vision-language models in Person Re-identification tasks.
To achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss.
The comprehensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid, DualFocus demonstrates superior performance over state-of-the-art methods.
- Score: 6.381155145404096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-based Person Re-identification (TPR) aims to retrieve specific individual images from datasets based on textual descriptions. Existing TPR methods primarily focus on recognizing explicit and positive characteristics, often overlooking the role of negative descriptions. This oversight can lead to false positives-images that meet positive criteria but should be excluded based on negative descriptions. To address these limitations, we introduce DualFocus, a unified framework that integrates plausible descriptions to enhance the interpretative accuracy of vision-language models in TPR tasks. DualFocus leverages Dual (Positive/Negative) Attribute Prompt Learning (DAPL), which incorporates Dual Image-Attribute Contrastive (DIAC) Learning and Sensitive Image-Attributes Matching (SIAM) Learning, enabling the detection of non-existent attributes and reducing false positives. To achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss, which refines the representation of both matching and non-matching descriptions, thereby improving the matching process through detailed and adaptable similarity assessments. The comprehensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid, DualFocus demonstrates superior performance over state-of-the-art methods, significantly enhancing both precision and robustness in TPR.
Related papers
- Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z) - PV2TEA: Patching Visual Modality to Textual-Established Information
Extraction [59.76117533540496]
We patch the visual modality to the textual-established attribute information extractor.
PV2TEA is an encoder-decoder architecture equipped with three bias reduction schemes.
Empirical results on real-world e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1 increase over unimodal baselines.
arXiv Detail & Related papers (2023-06-01T05:39:45Z) - RaSa: Relation and Sensitivity Aware Representation Learning for
Text-based Person Search [51.09723403468361]
We propose a Relation and Sensitivity aware representation learning method (RaSa)
RaSa includes two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA)
Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on datasets.
arXiv Detail & Related papers (2023-05-23T03:53:57Z) - Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior
Understanding [12.509298933267221]
We introduce a two-stage Contrastive Learning with Text-Embeded framework for Facial behavior understanding.
The first stage is a weakly-supervised contrastive learning method that learns representations from positive-negative pairs constructed using coarse-grained activity information.
The second stage aims to train the recognition of facial expressions or facial action units by maximizing the similarity between image and the corresponding text label names.
arXiv Detail & Related papers (2023-03-31T18:21:09Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - TaCo: Textual Attribute Recognition via Contrastive Learning [9.042957048594825]
TaCo is a contrastive framework for textual attribute recognition tailored toward the most common document scenes.
We design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning.
Experiments show that TaCo surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks.
arXiv Detail & Related papers (2022-08-22T09:45:34Z) - Pose-guided Visible Part Matching for Occluded Person ReID [80.81748252960843]
We propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility.
Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-01T04:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.