DSSL: Deep Surroundings-person Separation Learning for Text-based Person
Retrieval
- URL: http://arxiv.org/abs/2109.05534v1
- Date: Sun, 12 Sep 2021 15:09:09 GMT
- Title: DSSL: Deep Surroundings-person Separation Learning for Text-based Person
Retrieval
- Authors: Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang,
Fangqiang Hu, Gang Hua
- Abstract summary: We propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper.
A surroundings-person separation and fusion mechanism plays the key role to realize an accurate and effective surroundings-person separation.
Extensive experiments are carried out to evaluate the proposed DSSL on CUHK-PEDES.
- Score: 40.70100506088116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many previous methods on text-based person retrieval tasks are devoted to
learning a latent common space mapping, with the purpose of extracting
modality-invariant features from both visual and textual modality.
Nevertheless, due to the complexity of high-dimensional data, the unconstrained
mapping paradigms are not able to properly catch discriminative clues about the
corresponding person while drop the misaligned information. Intuitively, the
information contained in visual data can be divided into person information
(PI) and surroundings information (SI), which are mutually exclusive from each
other. To this end, we propose a novel Deep Surroundings-person Separation
Learning (DSSL) model in this paper to effectively extract and match person
information, and hence achieve a superior retrieval accuracy. A
surroundings-person separation and fusion mechanism plays the key role to
realize an accurate and effective surroundings-person separation under a
mutually exclusion constraint. In order to adequately utilize multi-modal and
multi-granular information for a higher retrieval accuracy, five diverse
alignment paradigms are adopted. Extensive experiments are carried out to
evaluate the proposed DSSL on CUHK-PEDES, which is currently the only
accessible dataset for text-base person retrieval task. DSSL achieves the
state-of-the-art performance on CUHK-PEDES. To properly evaluate our proposed
DSSL in the real scenarios, a Real Scenarios Text-based Person Reidentification
(RSTPReid) dataset is constructed to benefit future research on text-based
person retrieval, which will be publicly available.
Related papers
- Semantic Meta-Split Learning: A TinyML Scheme for Few-Shot Wireless Image Classification [50.28867343337997]
This work presents a TinyML-based semantic communication framework for few-shot wireless image classification.
We exploit split-learning to limit the computations performed by the end-users while ensuring privacy-preserving.
meta-learning overcomes data availability concerns and speeds up training by utilizing similarly trained tasks.
arXiv Detail & Related papers (2024-09-03T05:56:55Z) - From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification [4.400729890122927]
The aim of text-based person Re-ID is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions.
There is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective.
We introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task.
arXiv Detail & Related papers (2024-07-31T18:16:18Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Prototype-Guided Text-based Person Search based on Rich Chinese
Descriptions [20.02304350708749]
We propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW.
Our dataset contains 47,102 sentences, which means there is quite more information than existing dataset.
To alleviate the inconsistency between person detection and text-based person retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset.
arXiv Detail & Related papers (2023-12-22T17:08:14Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Text is no more Enough! A Benchmark for Profile-based Spoken Language
Understanding [26.549776399115203]
Profile-based Spoken Language Understanding (ProSLU) requires the model that not only relies on the plain text but also the supporting profile information to predict the correct intents and slots.
We introduce a large-scale human-annotated Chinese dataset with over 5K utterances and their corresponding supporting profile information.
Experimental results reveal that all existing text-based SLU models fail to work when the utterances are semantically ambiguous.
arXiv Detail & Related papers (2021-12-22T15:22:17Z) - Text-based Person Search in Full Images via Semantic-Driven Proposal
Generation [42.25611020956918]
We propose a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks.
To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals.
arXiv Detail & Related papers (2021-09-27T11:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.