Noisy-Correspondence Learning for Text-to-Image Person Re-identification
- URL: http://arxiv.org/abs/2308.09911v3
- Date: Thu, 28 Mar 2024 07:16:11 GMT
- Title: Noisy-Correspondence Learning for Text-to-Image Person Re-identification
- Authors: Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, Peng Hu,
- Abstract summary: We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
- Score: 50.07634676709067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, the image-text pairs inevitably exist under-correlated or even false-correlated, a.k.a noisy correspondence (NC), due to the low quality of the images and annotation errors. To address this problem, we propose a novel Robust Dual Embedding method (RDE) that can learn robust visual-semantic associations even with NC. Specifically, RDE consists of two main components: 1) A Confident Consensus Division (CCD) module that leverages the dual-grained decisions of dual embedding modules to obtain a consensus set of clean training data, which enables the model to learn correct and reliable visual-semantic associations. 2) A Triplet Alignment Loss (TAL) relaxes the conventional Triplet Ranking loss with the hardest negative samples to a log-exponential upper bound over all negative ones, thus preventing the model collapse under NC and can also focus on hard-negative samples for promising performance. We conduct extensive experiments on three public benchmarks, namely CUHK-PEDES, ICFG-PEDES, and RSTPReID, to evaluate the performance and robustness of our RDE. Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on all three datasets. Code is available at https://github.com/QinYang79/RDE.
Related papers
- DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person Retrieval [6.381155145404096]
We introduce DualFocus, a framework for integrating positive and negative descriptors.
By focusing on token-level comparisons, DualFocus significantly outperforms existing techniques in both precision and robustness.
Experiment results highlight DualFocus's superior performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid.
arXiv Detail & Related papers (2024-05-13T04:21:00Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Dynamic Weighted Combiner for Mixed-Modal Image Retrieval [8.683144453481328]
Mixed-Modal Image Retrieval (MMIR) as a flexible search paradigm has attracted wide attention.
Previous approaches always achieve limited performance, due to two critical factors.
We propose a Dynamic Weighted Combiner (DWC) to tackle the above challenges.
arXiv Detail & Related papers (2023-12-11T07:36:45Z) - Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image
Retrieval [48.914550252133125]
We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.
Css-Net can alleviate triplet ambiguity, achieving competitive performance on benchmarks, such as $+2.77%$ R@10 and $+6.67%$ R@50 on FashionIQ.
arXiv Detail & Related papers (2023-06-03T11:50:44Z) - CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
Image-Text Retrieval [108.48540976175457]
We propose Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation.
We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.
Experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2022-08-21T08:37:50Z) - Optimized latent-code selection for explainable conditional
text-to-image GANs [8.26410341981427]
We present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model.
We propose a framework for finding good latent codes by utilizing a linear SVM.
arXiv Detail & Related papers (2022-04-27T03:12:55Z) - SURDS: Self-Supervised Attention-guided Reconstruction and Dual Triplet
Loss for Writer Independent Offline Signature Verification [16.499360910037904]
Offline Signature Verification (OSV) is a fundamental biometric task across various forensic, commercial and legal applications.
We propose a two-stage deep learning framework that leverages self-supervised representation learning as well as metric learning for writer-independent OSV.
The proposed framework has been evaluated on two publicly available offline signature datasets and compared with various state-of-the-art methods.
arXiv Detail & Related papers (2022-01-25T07:26:55Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - Devil's in the Details: Aligning Visual Clues for Conditional Embedding
in Person Re-Identification [94.77172127405846]
We propose two key recognition patterns to better utilize the detail information of pedestrian images.
CACE-Net achieves state-of-the-art performance on three public datasets.
arXiv Detail & Related papers (2020-09-11T06:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.