Generalizable Person Search on Open-world User-Generated Video Content
- URL: http://arxiv.org/abs/2310.10068v1
- Date: Mon, 16 Oct 2023 04:59:50 GMT
- Title: Generalizable Person Search on Open-world User-Generated Video Content
- Authors: Junjie Li, Guanshuo Wang, Yichao Yan, Fufu Yu, Qiong Jia, Jie Qin,
Shouhong Ding, Xiaokang Yang
- Abstract summary: Person search is a challenging task that involves retrieving individuals from a large set of un-cropped scene images.
Existing person search applications are mostly trained and deployed in the same-origin scenarios.
We propose a generalizable framework on both feature-level and data-level generalization to facilitate downstream tasks in arbitrary scenarios.
- Score: 93.72028298712118
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person search is a challenging task that involves detecting and retrieving
individuals from a large set of un-cropped scene images. Existing person search
applications are mostly trained and deployed in the same-origin scenarios.
However, collecting and annotating training samples for each scene is often
difficult due to the limitation of resources and the labor cost. Moreover,
large-scale intra-domain data for training are generally not legally available
for common developers, due to the regulation of privacy and public security.
Leveraging easily accessible large-scale User Generated Video Contents
(\emph{i.e.} UGC videos) to train person search models can fit the open-world
distribution, but still suffering a performance gap from the domain difference
to surveillance scenes. In this work, we explore enhancing the out-of-domain
generalization capabilities of person search models, and propose a
generalizable framework on both feature-level and data-level generalization to
facilitate downstream tasks in arbitrary scenarios. Specifically, we focus on
learning domain-invariant representations for both detection and ReID by
introducing a multi-task prototype-based domain-specific batch normalization,
and a channel-wise ID-relevant feature decorrelation strategy. We also identify
and address typical sources of noise in open-world training frames, including
inaccurate bounding boxes, the omission of identity labels, and the absence of
cross-camera data. Our framework achieves promising performance on two
challenging person search benchmarks without using any human annotation or
samples from the target domain.
Related papers
- Diverse Deep Feature Ensemble Learning for Omni-Domain Generalized Person Re-identification [30.208890289394994]
Person ReID methods experience a significant drop in performance when trained and tested across different datasets.
Our research reveals that domain generalization methods significantly underperform single-domain supervised methods on single dataset benchmarks.
We propose a way to achieve ODG-ReID by creating deep feature diversity with self-ensembles.
arXiv Detail & Related papers (2024-10-11T02:27:11Z) - FedSIS: Federated Split Learning with Intermediate Representation
Sampling for Privacy-preserving Generalized Face Presentation Attack
Detection [4.1897081000881045]
Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms.
In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization.
arXiv Detail & Related papers (2023-08-20T11:49:12Z) - Deep Multimodal Fusion for Generalizable Person Re-identification [15.250738959921872]
DMF is a Deep Multimodal Fusion network for the general scenarios on person re-identification task.
Rich semantic knowledge is introduced to assist in feature representation learning during the pre-training stage.
A realistic dataset is adopted to fine-tine the pre-trained model for distribution alignment with real-world.
arXiv Detail & Related papers (2022-11-02T07:42:48Z) - Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z) - Semi-Supervised Domain Generalizable Person Re-Identification [74.75528879336576]
Existing person re-identification (re-id) methods are stuck when deployed to a new unseen scenario.
Recent efforts have been devoted to domain adaptive person re-id where extensive unlabeled data in the new scenario are utilized in a transductive learning manner.
We aim to explore multiple labeled datasets to learn generalized domain-invariant representations for person re-id.
arXiv Detail & Related papers (2021-08-11T06:08:25Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - A Background-Agnostic Framework with Adversarial Training for Abnormal
Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years.
We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z) - One-Shot Unsupervised Cross-Domain Detection [33.04327634746745]
This paper presents an object detection algorithm able to perform unsupervised adaption across domains by using only one target sample, seen at test time.
We achieve this by introducing a multi-task architecture that one-shot adapts to any incoming sample by iteratively solving a self-supervised task on it.
arXiv Detail & Related papers (2020-05-23T22:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.