RaSa: Relation and Sensitivity Aware Representation Learning for
Text-based Person Search
- URL: http://arxiv.org/abs/2305.13653v1
- Date: Tue, 23 May 2023 03:53:57 GMT
- Title: RaSa: Relation and Sensitivity Aware Representation Learning for
Text-based Person Search
- Authors: Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan,
Liqiang Nie, Min Zhang
- Abstract summary: We propose a Relation and Sensitivity aware representation learning method (RaSa)
RaSa includes two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA)
Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on datasets.
- Score: 51.09723403468361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-based person search aims to retrieve the specified person images given a
textual description. The key to tackling such a challenging task is to learn
powerful multi-modal representations. Towards this, we propose a Relation and
Sensitivity aware representation learning method (RaSa), including two novel
tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For
one thing, existing methods cluster representations of all positive pairs
without distinction and overlook the noise problem caused by the weak positive
pairs where the text and the paired image have noise correspondences, thus
leading to overfitting learning. RA offsets the overfitting risk by introducing
a novel positive relation detection task (i.e., learning to distinguish strong
and weak positive pairs). For another thing, learning invariant representation
under data augmentation (i.e., being insensitive to some transformations) is a
general practice for improving representation's robustness in existing methods.
Beyond that, we encourage the representation to perceive the sensitive
transformation by SA (i.e., learning to detect the replaced words), thus
promoting the representation's robustness. Experiments demonstrate that RaSa
outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in
terms of Rank@1 on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively.
Code is available at: https://github.com/Flame-Chasers/RaSa.
Related papers
- DualFocus: Integrating Plausible Descriptions in Text-based Person Re-identification [6.381155145404096]
We introduce DualFocus, a unified framework that integrates plausible descriptions to enhance the interpretative accuracy of vision-language models in Person Re-identification tasks.
To achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss.
The comprehensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid, DualFocus demonstrates superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-05-13T04:21:00Z) - Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training [33.51524424536508]
Iterative Prompt Relabeling (IPR) is a novel algorithm that aligns images to text through iterative image sampling and prompt relabeling with feedback.
We conduct thorough experiments on SDv2 and SDXL, testing their capability to follow instructions on spatial relations.
arXiv Detail & Related papers (2023-12-23T11:10:43Z) - Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - DeepRING: Learning Roto-translation Invariant Representation for LiDAR
based Place Recognition [12.708391665878844]
We propose DeepRING to learn the roto-translation invariant representation from LiDAR scan.
There are two keys in DeepRING: the feature is extracted from sinogram, and the feature is aggregated by magnitude spectrum.
We state the place recognition as a one-shot learning problem with each place being a class, leveraging relation learning to build representation similarity.
arXiv Detail & Related papers (2022-10-20T05:35:30Z) - Learning Fair Representation via Distributional Contrastive
Disentanglement [9.577369164287813]
Learning fair representation is crucial for achieving fairness or debiasing sensitive information.
We propose a new approach, learning FAir Representation via distributional CONtrastive Variational AutoEncoder (FarconVAE)
We show superior performance on fairness, pretrained model debiasing, and domain generalization tasks from various modalities.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Unified Contrastive Learning in Image-Text-Label Space [130.31947133453406]
Unified Contrastive Learning (UniCL) is effective way of learning semantically rich yet discriminative representations.
UniCL stand-alone is a good learner on pure imagelabel data, rivaling the supervised learning methods across three image classification datasets.
arXiv Detail & Related papers (2022-04-07T17:34:51Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Fairness by Learning Orthogonal Disentangled Representations [50.82638766862974]
We propose a novel disentanglement approach to invariant representation problem.
We enforce the meaningful representation to be agnostic to sensitive information by entropy.
The proposed approach is evaluated on five publicly available datasets.
arXiv Detail & Related papers (2020-03-12T11:09:15Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.