Global-Local Context Network for Person Search
- URL: http://arxiv.org/abs/2112.02500v1
- Date: Sun, 5 Dec 2021 07:38:53 GMT
- Title: Global-Local Context Network for Person Search
- Authors: Peng Zheng, Jie Qin, Yichao Yan, Shengcai Liao, Bingbing Ni, Xiaogang
Cheng and Ling Shao
- Abstract summary: Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
- Score: 125.51080862575326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person search aims to jointly localize and identify a query person from
natural, uncropped images, which has been actively studied in the computer
vision community over the past few years. In this paper, we delve into the rich
context information globally and locally surrounding the target person, which
we refer to scene and group context, respectively. Unlike previous works that
treat the two types of context individually, we exploit them in a unified
global-local context network (GLCNet) with the intuitive aim of feature
enhancement. Specifically, re-ID embeddings and context features are enhanced
simultaneously in a multi-stage fashion, ultimately leading to enhanced,
discriminative features for person search. We conduct the experiments on two
person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our
approach to a more challenging setting (i.e., character search on MovieNet).
Extensive experimental results demonstrate the consistent improvement of the
proposed GLCNet over the state-of-the-art methods on the three datasets. Our
source codes, pre-trained models, and the new setting for character search are
available at: https://github.com/ZhengPeng7/GLCNet.
Related papers
- Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment [18.65004981045047]
We propose a novel asynchronous feedback network (AFNet) to deal with global and local feature.
AFNet employs a dual-branch structure to deal with global and local feature, simulating the left and right hemispheres of the human brain, and constructs a feedback module between them.
We conduct comprehensive experiments on three datasets and achieve superior performance over the state-of-the-art approaches on all of these datasets.
arXiv Detail & Related papers (2024-07-13T08:52:44Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Generalizable Person Search on Open-world User-Generated Video Content [93.72028298712118]
Person search is a challenging task that involves retrieving individuals from a large set of un-cropped scene images.
Existing person search applications are mostly trained and deployed in the same-origin scenarios.
We propose a generalizable framework on both feature-level and data-level generalization to facilitate downstream tasks in arbitrary scenarios.
arXiv Detail & Related papers (2023-10-16T04:59:50Z) - Learning to Discover and Detect Objects [43.52208526783969]
We tackle the problem of novel class discovery, detection, and localization (NCDL)
In this setting, we assume a source dataset with labels for objects of commonly observed classes.
By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes.
arXiv Detail & Related papers (2022-10-19T17:59:55Z) - OIMNet++: Prototypical Normalization and Localization-aware Learning for
Person Search [34.460973847554364]
We address the task of person search, that is, localizing and re-identifying query persons from a set of raw scene images.
Recent approaches are typically built upon OIMNet, a pioneer work on person search, that learns joint person representations for performing both detection and person re-identification tasks.
We introduce a novel normalization layer, dubbed ProtoNorm, that calibrates features from pedestrian proposals, while considering a long-tail distribution of person IDs.
arXiv Detail & Related papers (2022-07-21T06:34:03Z) - Exploring Visual Context for Weakly Supervised Person Search [155.46727990750227]
Person search has recently emerged as a challenging task that jointly addresses pedestrian detection and person re-identification.
Existing approaches follow a fully supervised setting where both bounding box and identity annotations are available.
This paper inventively considers weakly supervised person search with only bounding box annotations.
arXiv Detail & Related papers (2021-06-19T14:47:13Z) - Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification [82.6971648465279]
We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-07T12:27:42Z) - Generative Language-Grounded Policy in Vision-and-Language Navigation
with Bayes' Rule [80.0853069632445]
Vision-and-language navigation (VLN) is a task in which an agent is embodied in a realistic 3D environment and follows an instruction to reach the goal node.
In this paper, we design and investigate a generative language-grounded policy which uses a language model to compute the distribution over all possible instructions.
In experiments, we show that the proposed generative approach outperforms the discriminative approach in the Room-2-Room (R2R) and Room-4-Room (R4R) datasets, especially in the unseen environments.
arXiv Detail & Related papers (2020-09-16T16:23:17Z) - A Convolutional Baseline for Person Re-Identification Using Vision and
Language Descriptions [24.794592610444514]
In real-world surveillance scenarios, frequently no visual information will be available about the queried person.
A two stream deep convolutional neural network framework supervised by cross entropy loss is presented.
The learnt visual representations are more robust and perform 22% better during retrieval as compared to a single modality system.
arXiv Detail & Related papers (2020-02-20T10:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.