Efficient Person Search: An Anchor-Free Approach
- URL: http://arxiv.org/abs/2109.00211v1
- Date: Wed, 1 Sep 2021 07:01:33 GMT
- Title: Efficient Person Search: An Anchor-Free Approach
- Authors: Yichao Yan, Jinpeng Li, Jie Qin, Shengcai Liao, Xiaokang Yang
- Abstract summary: Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
- Score: 86.45858994806471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person search aims to simultaneously localize and identify a query person
from realistic, uncropped images. To achieve this goal, state-of-the-art models
typically add a re-id branch upon two-stage detectors like Faster R-CNN. Owing
to the ROI-Align operation, this pipeline yields promising accuracy as re-id
features are explicitly aligned with the corresponding object regions, but in
the meantime, it introduces high computational overhead due to dense object
anchors. In this work, we present an anchor-free approach to efficiently
tackling this challenging task, by introducing the following dedicated designs.
First, we select an anchor-free detector (i.e., FCOS) as the prototype of our
framework. Due to the lack of dense object anchors, it exhibits significantly
higher efficiency compared with existing person search models. Second, when
directly accommodating this anchor-free detector for person search, there exist
several major challenges in learning robust re-id features, which we summarize
as the misalignment issues in different levels (i.e., scale, region, and task).
To address these issues, we propose an aligned feature aggregation module to
generate more discriminative and robust feature embeddings. Accordingly, we
name our model as Feature-Aligned Person Search Network (AlignPS). Third, by
investigating the advantages of both anchor-based and anchor-free models, we
further augment AlignPS with an ROI-Align head, which significantly improves
the robustness of re-id features while still keeping our model highly
efficient. Extensive experiments conducted on two challenging benchmarks (i.e.,
CUHK-SYSU and PRW) demonstrate that our framework achieves state-of-the-art or
competitive performance, while displaying higher efficiency. All the source
codes, data, and trained models are available at:
https://github.com/daodaofr/alignps.
Related papers
- Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Enhancing Object Detection for Autonomous Driving by Optimizing Anchor
Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles.
The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z) - Anchor-Free Person Search [127.88668724345195]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
Most existing works employ two-stage detectors like Faster-RCNN, yielding encouraging accuracy but with high computational overhead.
We present the Feature-Aligned Person Search Network (AlignPS), the first anchor-free framework to efficiently tackle this challenging task.
arXiv Detail & Related papers (2021-03-22T07:04:29Z) - Sequential End-to-end Network for Efficient Person Search [7.3658840620058115]
Person search aims at jointly solving Person Detection and Person Re-identification (re-ID)
Existing works have designed end-to-end networks based on Faster R-CNN.
We propose a Sequential End-to-end Network (SeqNet) to extract superior features.
arXiv Detail & Related papers (2021-03-18T10:28:24Z) - CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented
Object Detection in Remote Sensing Images [0.9462808515258465]
In this paper, we discuss the role of discriminative features in object detection.
We then propose a Critical Feature Capturing Network (CFC-Net) to improve detection accuracy.
We show that our method achieves superior detection performance compared with many state-of-the-art approaches.
arXiv Detail & Related papers (2021-01-18T02:31:09Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - Scope Head for Accurate Localization in Object Detection [135.9979405835606]
We propose a novel detector coined as ScopeNet, which models anchors of each location as a mutually dependent relationship.
With our concise and effective design, the proposed ScopeNet achieves state-of-the-art results on COCO.
arXiv Detail & Related papers (2020-05-11T04:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.