Prototype-Guided Text-based Person Search based on Rich Chinese
Descriptions
- URL: http://arxiv.org/abs/2312.14834v1
- Date: Fri, 22 Dec 2023 17:08:14 GMT
- Title: Prototype-Guided Text-based Person Search based on Rich Chinese
Descriptions
- Authors: Ziqiang Wu, Bingpeng Ma
- Abstract summary: We propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW.
Our dataset contains 47,102 sentences, which means there is quite more information than existing dataset.
To alleviate the inconsistency between person detection and text-based person retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset.
- Score: 20.02304350708749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-based person search aims to simultaneously localize and identify the
target person based on query text from uncropped scene images, which can be
regarded as the unified task of person detection and text-based person
retrieval task. In this work, we propose a large-scale benchmark dataset named
PRW-TPS-CN based on the widely used person search dataset PRW. Our dataset
contains 47,102 sentences, which means there is quite more information than
existing dataset. These texts precisely describe the person images from top to
bottom, which in line with the natural description order. We also provide both
Chinese and English descriptions in our dataset for more comprehensive
evaluation. These characteristics make our dataset more applicable. To
alleviate the inconsistency between person detection and text-based person
retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset. We
propose to aggregate multiple texts as text prototypes to maintain the
prominent text features of a person, which can better reflect the whole
character of a person. The overall prototypes lead to generating the image
attention map to eliminate the detection misalignment causing the decrease of
text-based person retrieval. Thus, the inconsistency between person detection
and text-based person retrieval is largely alleviated. We conduct extensive
experiments on the PRW-TPS-CN dataset. The experimental results show the
PRW-TPS-CN dataset's effectiveness and the state-of-the-art performance of our
approach.
Related papers
- From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification [4.400729890122927]
The aim of text-based person Re-ID is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions.
There is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective.
We introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task.
arXiv Detail & Related papers (2024-07-31T18:16:18Z) - Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - GPT-generated Text Detection: Benchmark Dataset and Tensor-based
Detection Method [4.802604527842989]
We present GPT Reddit dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset.
The dataset consists of context-prompt pairs based on Reddit with human-generated and ChatGPT-generated responses.
To showcase the dataset's utility, we benchmark several detection methods on it, demonstrating their efficacy in distinguishing between human and ChatGPT-generated responses.
arXiv Detail & Related papers (2024-03-12T05:15:21Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Text-based Person Search without Parallel Image-Text Data [52.63433741872629]
Text-based person search (TBPS) aims to retrieve the images of the target person from a large image gallery based on a given natural language description.
Existing methods are dominated by training models with parallel image-text pairs, which are very costly to collect.
In this paper, we make the first attempt to explore TBPS without parallel image-text data.
arXiv Detail & Related papers (2023-05-22T12:13:08Z) - Text-based Person Search in Full Images via Semantic-Driven Proposal
Generation [42.25611020956918]
We propose a new end-to-end learning framework which jointly optimize the pedestrian detection, identification and visual-semantic feature embedding tasks.
To take full advantage of the query text, the semantic features are leveraged to instruct the Region Proposal Network to pay more attention to the text-described proposals.
arXiv Detail & Related papers (2021-09-27T11:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.