Sequential Transformer for End-to-End Person Search
- URL: http://arxiv.org/abs/2211.04323v1
- Date: Sun, 6 Nov 2022 09:32:30 GMT
- Title: Sequential Transformer for End-to-End Person Search
- Authors: Long Chen, Jinhua Xu
- Abstract summary: Person search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images.
In this paper, we propose a novel Sequential Transformer (SeqTR) for end-to-end person search to deal with this challenge.
Our SeqTR contains a detection transformer and a novel re-ID transformer that sequentially addresses detection and re-ID tasks.
- Score: 4.920657401819193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person Search aims to simultaneously localize and recognize a target person
from realistic and uncropped gallery images. One major challenge of person
search comes from the contradictory goals of the two sub-tasks, i.e., person
detection focuses on finding the commonness of all persons so as to distinguish
persons from the background, while person re-identification (re-ID) focuses on
the differences among different persons. In this paper, we propose a novel
Sequential Transformer (SeqTR) for end-to-end person search to deal with this
challenge. Our SeqTR contains a detection transformer and a novel re-ID
transformer that sequentially addresses detection and re-ID tasks. The re-ID
transformer comprises the self-attention layer that utilizes contextual
information and the cross-attention layer that learns local fine-grained
discriminative features of the human body. Moreover, the re-ID transformer is
shared and supervised by multi-scale features to improve the robustness of
learned person representations. Extensive experiments on two widely-used person
search benchmarks, CUHK-SYSU and PRW, show that our proposed SeqTR not only
outperforms all existing person search methods with a 59.3% mAP on PRW but also
achieves comparable performance to the state-of-the-art results with an mAP of
94.8% on CUHK-SYSU.
Related papers
- Transformer for Object Re-Identification: A Survey [69.61542572894263]
Vision Transformers have spurred a growing number of studies delving deeper into Transformer-based Re-ID.
This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID.
Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-01-13T03:17:57Z) - PSDiff: Diffusion Model for Person Search with Iterative and
Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.
PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.
Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z) - Learning Feature Recovery Transformer for Occluded Person
Re-identification [71.18476220969647]
We propose a new approach called Feature Recovery Transformer (FRT) to address the two challenges simultaneously.
To reduce the interference of the noise during feature matching, we mainly focus on visible regions that appear in both images and develop a visibility graph to calculate the similarity.
In terms of the second challenge, based on the developed graph similarity, for each query image, we propose a recovery transformer that exploits the feature sets of its $k$-nearest neighbors in the gallery to recover the complete features.
arXiv Detail & Related papers (2023-01-05T02:36:16Z) - PSTR: End-to-End One-Step Person Search With Transformers [140.32813648935752]
We propose a one-step transformer-based person search framework, PSTR.
PSS module contains a detection encoder-decoder for person detection along with a discriminative re-id decoder for person re-id.
On the challenging PRW benchmark, PSTR achieves a mean average precision (mAP) score of 56.5%.
arXiv Detail & Related papers (2022-04-07T10:22:33Z) - Cascade Transformers for End-to-End Person Search [18.806369852341334]
We propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search.
COAT focuses on detecting people in the first stage, while later stages simultaneously and progressively refine the representation for person detection and re-identification.
We demonstrate the benefits of our method by achieving state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-03-17T22:42:12Z) - Subtask-dominated Transfer Learning for Long-tail Person Search [12.311100923753449]
Person search unifies person detection and person re-identification (Re-ID) to locate query persons from panoramic gallery images.
One major challenge comes from the imbalanced long-tail person identity distributions.
We propose a Subtask-dominated Transfer Learning (STL) method to solve this problem.
arXiv Detail & Related papers (2021-12-01T14:34:48Z) - Diverse Part Discovery: Occluded Person Re-identification with
Part-Aware Transformer [95.02123369512384]
Occluded person re-identification (Re-ID) is a challenging task as persons are frequently occluded by various obstacles or other persons.
We propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery.
arXiv Detail & Related papers (2021-06-08T04:29:07Z) - Person Re-identification based on Robust Features in Open-world [0.0]
We propose a low-cost and high-efficiency method to solve shortcomings of the existing re-ID research.
Our approach based on pose estimation model improved by group convolution to obtain the continuous key points of pedestrian.
Our method achieves Rank-1: 60.9%, Rank-5: 78.1%, and mAP: 49.2% on this dataset, which exceeds most existing state-of-art re-ID models.
arXiv Detail & Related papers (2021-02-22T06:49:28Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - FMT:Fusing Multi-task Convolutional Neural Network for Person Search [33.91664470686695]
We propose a fusing multi-task convolutional neural network(FMT-CNN) to tackle the correlation and heterogeneity of detection and re-identification.
Experiment results on CUHK-SYSU Person Search dataset show that the performance of our proposed method is superior to state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-01T05:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.