Divide and Conquer: Hybrid Pre-training for Person Search
- URL: http://arxiv.org/abs/2312.07970v1
- Date: Wed, 13 Dec 2023 08:33:50 GMT
- Title: Divide and Conquer: Hybrid Pre-training for Person Search
- Authors: Yanling Tian, Di Chen, Yunan Liu, Jian Yang, Shanshan Zhang
- Abstract summary: We propose a hybrid pre-training framework specifically designed for person search using sub-task data only.
Our model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone.
Our code and pre-trained models are released for plug-and-play usage to the person search community.
- Score: 40.13016375392472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-training has proven to be an effective method for improving
performance across different tasks. Current person search methods use ImageNet
pre-trained models for feature extraction, yet it is not an optimal solution
due to the gap between the pre-training task and person search task (as a
downstream task). Therefore, in this paper, we focus on pre-training for person
search, which involves detecting and re-identifying individuals simultaneously.
Although labeled data for person search is scarce, datasets for two sub-tasks
person detection and re-identification are relatively abundant. To this end, we
propose a hybrid pre-training framework specifically designed for person search
using sub-task data only. It consists of a hybrid learning paradigm that
handles data with different kinds of supervisions, and an intra-task alignment
module that alleviates domain discrepancy under limited resources. To the best
of our knowledge, this is the first work that investigates how to support
full-task pre-training using sub-task data. Extensive experiments demonstrate
that our pre-trained model can achieve significant improvements across diverse
protocols, such as person search method, fine-tuning data, pre-training data
and model backbone. For example, our model improves ResNet50 based NAE by 10.3%
relative improvement w.r.t. mAP. Our code and pre-trained models are released
for plug-and-play usage to the person search community.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration.
Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data.
We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z) - RPT: Toward Transferable Model on Heterogeneous Researcher Data via
Pre-Training [19.987304448524043]
We propose a multi-task self-supervised learning-based researcher data pre-training model named RPT.
We divide the researchers' data into semantic document sets and community graph.
We propose three self-supervised learning objectives to train the whole model.
arXiv Detail & Related papers (2021-10-08T03:42:09Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.