Towards Fully Decoupled End-to-End Person Search
- URL: http://arxiv.org/abs/2309.04967v3
- Date: Sun, 10 Mar 2024 14:00:23 GMT
- Title: Towards Fully Decoupled End-to-End Person Search
- Authors: Pengcheng Zhang, Xiao Bai, Jin Zheng, Xin Ning
- Abstract summary: End-to-end person search aims to jointly detect and re-identify a target person in raw scene images with a unified model.
The detection task unifies all persons while the re-id task discriminates different identities, resulting in conflict optimal objectives.
Existing methods are still sub-optimal on one or two of the sub-tasks due to their partially decoupled models.
A task-incremental person search network is proposed to incrementally construct an end-to-end model for the detection and re-id sub-task.
- Score: 15.126269826140247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end person search aims to jointly detect and re-identify a target
person in raw scene images with a unified model. The detection task unifies all
persons while the re-id task discriminates different identities, resulting in
conflict optimal objectives. Existing works proposed to decouple end-to-end
person search to alleviate such conflict. Yet these methods are still
sub-optimal on one or two of the sub-tasks due to their partially decoupled
models, which limits the overall person search performance. In this paper, we
propose to fully decouple person search towards optimal person search. A
task-incremental person search network is proposed to incrementally construct
an end-to-end model for the detection and re-id sub-task, which decouples the
model architecture for the two sub-tasks. The proposed task-incremental network
allows task-incremental training for the two conflicting tasks. This enables
independent learning for different objectives thus fully decoupled the model
for persons earch. Comprehensive experimental evaluations demonstrate the
effectiveness of the proposed fully decoupled models for end-to-end person
search.
Related papers
- PSDiff: Diffusion Model for Person Search with Iterative and
Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.
PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.
Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z) - Grouped Adaptive Loss Weighting for Person Search [44.713344415358414]
Person search is a typical multi-task learning problem, especially when solved in an end-to-end manner.
We propose a Grouped Adaptive Loss Weighting (GALW) method which adjusts the weight of each task automatically and dynamically.
arXiv Detail & Related papers (2022-09-23T09:32:54Z) - Suspected Object Matters: Rethinking Model's Prediction for One-stage
Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones.
SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders.
Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z) - Subtask-dominated Transfer Learning for Long-tail Person Search [12.311100923753449]
Person search unifies person detection and person re-identification (Re-ID) to locate query persons from panoramic gallery images.
One major challenge comes from the imbalanced long-tail person identity distributions.
We propose a Subtask-dominated Transfer Learning (STL) method to solve this problem.
arXiv Detail & Related papers (2021-12-01T14:34:48Z) - Making Person Search Enjoy the Merits of Person Re-identification [12.311100923753449]
We propose a faster and stronger one-step person search framework, the Teacher-guided Disentangling Networks (TDN)
The proposed TDN can significantly boost the person search performance by transferring the advanced person Re-ID knowledge to the person search model.
We also propose a Knowledge Transfer Bridge module to bridge the scale gap caused by different input formats between the Re-ID model and one-step person search model.
arXiv Detail & Related papers (2021-08-24T06:00:13Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.