Related papers: Swap Path Network for Robust Person Search Pre-training

Swap Path Network for Robust Person Search Pre-training

URL: http://arxiv.org/abs/2412.05433v1
Date: Fri, 06 Dec 2024 21:35:26 GMT
Title: Swap Path Network for Robust Person Search Pre-training
Authors: Lucas Jaffe, Avideh Zakhor,
Abstract summary: We present the first framework for end-to-end person search pre-training.<n>We show that our method is more effective, efficient, and robust for person search pre-training than recent backbone-only pre-training alternatives.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In person search, we detect and rank matches to a query person image within a set of gallery scenes. Most person search models make use of a feature extraction backbone, followed by separate heads for detection and re-identification. While pre-training methods for vision backbones are well-established, pre-training additional modules for the person search task has not been previously examined. In this work, we present the first framework for end-to-end person search pre-training. Our framework splits person search into object-centric and query-centric methodologies, and we show that the query-centric framing is robust to label noise, and trainable using only weakly-labeled person bounding boxes. Further, we provide a novel model dubbed Swap Path Net (SPNet) which implements both query-centric and object-centric training objectives, and can swap between the two while using the same weights. Using SPNet, we show that query-centric pre-training, followed by object-centric fine-tuning, achieves state-of-the-art results on the standard PRW and CUHK-SYSU person search benchmarks, with 96.4% mAP on CUHK-SYSU and 61.2% mAP on PRW. In addition, we show that our method is more effective, efficient, and robust for person search pre-training than recent backbone-only pre-training alternatives.

Related papers

MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z)
CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval [0.18749305679160366]
Person attribute recognition and attribute-based retrieval are two core human-centric tasks. We introduce a robust cross-transformers network to handle person attribute recognition. We also introduce an effective training strategy to train only a few additional parameters for adapters. CLEAR achieves state-of-the-art performance or competitive results for both tasks.
arXiv Detail & Related papers (2024-03-10T07:31:06Z)
Learning to Retrieve for Job Matching [22.007634436648427]
We discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. We leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
arXiv Detail & Related papers (2024-02-21T00:05:25Z)
Divide and Conquer: Hybrid Pre-training for Person Search [40.13016375392472]
We propose a hybrid pre-training framework specifically designed for person search using sub-task data only. Our model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone. Our code and pre-trained models are released for plug-and-play usage to the person search community.
arXiv Detail & Related papers (2023-12-13T08:33:50Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
Generalizable Person Search on Open-world User-Generated Video Content [93.72028298712118]
Person search is a challenging task that involves retrieving individuals from a large set of un-cropped scene images. Existing person search applications are mostly trained and deployed in the same-origin scenarios. We propose a generalizable framework on both feature-level and data-level generalization to facilitate downstream tasks in arbitrary scenarios.
arXiv Detail & Related papers (2023-10-16T04:59:50Z)
PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff. PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths. Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z)
Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images. We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively. We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z)
Exploring Visual Context for Weakly Supervised Person Search [155.46727990750227]
Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person re-identification. Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available. This paper inventively considers weakly supervised person search with only bounding box annotations.
arXiv Detail & Related papers (2021-06-19T14:47:13Z)
Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images. Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches. We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.