Diverse Knowledge Distillation for End-to-End Person Search
- URL: http://arxiv.org/abs/2012.11187v1
- Date: Mon, 21 Dec 2020 09:04:27 GMT
- Title: Diverse Knowledge Distillation for End-to-End Person Search
- Authors: Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, Mingyu You
- Abstract summary: Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
- Score: 81.4926655119318
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Person search aims to localize and identify a specific person from a gallery
of images. Recent methods can be categorized into two groups, i.e., two-step
and end-to-end approaches. The former views person search as two independent
tasks and achieves dominant results using separately trained person detection
and re-identification (Re-ID) models. The latter performs person search in an
end-to-end fashion. Although the end-to-end approaches yield higher inference
efficiency, they largely lag behind those two-step counterparts in terms of
accuracy. In this paper, we argue that the gap between the two kinds of methods
is mainly caused by the Re-ID sub-networks of end-to-end methods. To this end,
we propose a simple yet strong end-to-end network with diverse knowledge
distillation to break the bottleneck. We also design a spatial-invariant
augmentation to assist model to be invariant to inaccurate detection results.
Experimental results on the CUHK-SYSU and PRW datasets demonstrate the
superiority of our method against existing approaches -- it achieves on par
accuracy with state-of-the-art two-step methods while maintaining high
efficiency due to the single joint model. Code is available at:
https://git.io/DKD-PersonSearch.
Related papers
- Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - PSDiff: Diffusion Model for Person Search with Iterative and
Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.
PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.
Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Approximate Nearest Neighbor Search under Neural Similarity Metric for
Large-Scale Recommendation [20.42993976179691]
We propose a novel method to extend ANN search to arbitrary matching functions.
Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items.
The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase.
arXiv Detail & Related papers (2022-02-14T07:55:57Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Mining the Benefits of Two-stage and One-stage HOI Detection [26.919979955155664]
Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years.
One-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification.
We propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner.
arXiv Detail & Related papers (2021-08-11T07:38:09Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.