Swap Path Network for Robust Person Search Pre-training
- URL: http://arxiv.org/abs/2412.05433v1
- Date: Fri, 06 Dec 2024 21:35:26 GMT
- Title: Swap Path Network for Robust Person Search Pre-training
- Authors: Lucas Jaffe, Avideh Zakhor,
- Abstract summary: We present the first framework for end-to-end person search pre-training.
We show that our method is more effective, efficient, and robust for person search pre-training than recent backbone-only pre-training alternatives.
- Score: 0.0
- License:
- Abstract: In person search, we detect and rank matches to a query person image within a set of gallery scenes. Most person search models make use of a feature extraction backbone, followed by separate heads for detection and re-identification. While pre-training methods for vision backbones are well-established, pre-training additional modules for the person search task has not been previously examined. In this work, we present the first framework for end-to-end person search pre-training. Our framework splits person search into object-centric and query-centric methodologies, and we show that the query-centric framing is robust to label noise, and trainable using only weakly-labeled person bounding boxes. Further, we provide a novel model dubbed Swap Path Net (SPNet) which implements both query-centric and object-centric training objectives, and can swap between the two while using the same weights. Using SPNet, we show that query-centric pre-training, followed by object-centric fine-tuning, achieves state-of-the-art results on the standard PRW and CUHK-SYSU person search benchmarks, with 96.4% mAP on CUHK-SYSU and 61.2% mAP on PRW. In addition, we show that our method is more effective, efficient, and robust for person search pre-training than recent backbone-only pre-training alternatives.
Related papers
- CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval [0.18749305679160366]
Person attribute recognition and attribute-based retrieval are two core human-centric tasks.
We introduce a robust cross-transformers network to handle person attribute recognition.
We also introduce an effective training strategy to train only a few additional parameters for adapters.
CLEAR achieves state-of-the-art performance or competitive results for both tasks.
arXiv Detail & Related papers (2024-03-10T07:31:06Z) - Learning to Retrieve for Job Matching [22.007634436648427]
We discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems.
We leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval.
In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
arXiv Detail & Related papers (2024-02-21T00:05:25Z) - Divide and Conquer: Hybrid Pre-training for Person Search [40.13016375392472]
We propose a hybrid pre-training framework specifically designed for person search using sub-task data only.
Our model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone.
Our code and pre-trained models are released for plug-and-play usage to the person search community.
arXiv Detail & Related papers (2023-12-13T08:33:50Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Generalizable Person Search on Open-world User-Generated Video Content [93.72028298712118]
Person search is a challenging task that involves retrieving individuals from a large set of un-cropped scene images.
Existing person search applications are mostly trained and deployed in the same-origin scenarios.
We propose a generalizable framework on both feature-level and data-level generalization to facilitate downstream tasks in arbitrary scenarios.
arXiv Detail & Related papers (2023-10-16T04:59:50Z) - PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.
PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.
Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z) - Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z) - Exploring Visual Context for Weakly Supervised Person Search [155.46727990750227]
Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person re-identification.
Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available.
This paper inventively considers weakly supervised person search with only bounding box annotations.
arXiv Detail & Related papers (2021-06-19T14:47:13Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.