Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification
- URL: http://arxiv.org/abs/2103.04337v1
- Date: Sun, 7 Mar 2021 12:27:42 GMT
- Title: Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification
- Authors: Xuehu Liu and Pingping Zhang and Chenyang Yu and Huchuan Lu and
Xiaoyun Yang
- Abstract summary: We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
- Score: 82.6971648465279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based person re-identification (Re-ID) aims to automatically retrieve
video sequences of the same person under non-overlapping cameras. To achieve
this goal, it is the key to fully utilize abundant spatial and temporal cues in
videos. Existing methods usually focus on the most conspicuous image regions,
thus they may easily miss out fine-grained clues due to the person varieties in
image sequences. To address above issues, in this paper, we propose a novel
Global-guided Reciprocal Learning (GRL) framework for video-based person Re-ID.
Specifically, we first propose a Global-guided Correlation Estimation (GCE) to
generate feature correlation maps of local features and global features, which
help to localize the high-and low-correlation regions for identifying the same
person. After that, the discriminative features are disentangled into
high-correlation features and low-correlation features under the guidance of
the global representations. Moreover, a novel Temporal Reciprocal Learning
(TRL) mechanism is designed to sequentially enhance the high-correlation
semantic information and accumulate the low-correlation sub-critical clues.
Extensive experiments on three public benchmarks indicate that our approach can
achieve better performance than other state-of-the-art approaches.
Related papers
- Local-Global Associative Frame Assemble in Video Re-ID [57.7470971197962]
Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause challenges in learning discriminative representations in video re-identification (Re-ID)
Most existing methods tackle this problem by assessing the importance of video frames according to either their local part alignments or global appearance correlations separately.
In this work, we explore jointly both local alignments and global correlations with further consideration of their mutual promotion/reinforcement.
arXiv Detail & Related papers (2021-10-22T19:07:39Z) - Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - GTA: Global Temporal Attention for Video Action Understanding [51.476605514802806]
We introduce Global Temporal Attention (AGT), which performs global temporal attention on top of spatial attention in a decoupled manner.
Tests on 2D and 3D networks demonstrate that our approach consistently enhances temporal modeling and provides state-of-the-art performance on three video action recognition datasets.
arXiv Detail & Related papers (2020-12-15T18:58:21Z) - Gait Recognition via Effective Global-Local Feature Representation and
Local Temporal Aggregation [28.721376937882958]
Gait recognition is one of the most important biometric technologies and has been applied in many fields.
Recent gait recognition frameworks represent each gait frame by descriptors extracted from either global appearances or local regions of humans.
We propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition.
arXiv Detail & Related papers (2020-11-03T04:07:13Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.