Temporal Complementary Learning for Video Person Re-Identification
- URL: http://arxiv.org/abs/2007.09357v1
- Date: Sat, 18 Jul 2020 07:59:01 GMT
- Title: Temporal Complementary Learning for Video Person Re-Identification
- Authors: Ruibing Hou and Hong Chang and Bingpeng Ma and Shiguang Shan and Xilin
Chen
- Abstract summary: This paper proposes a Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification.
A saliency erasing operation drives the specific learner to mine new and complementary parts by erasing the parts activated by previous frames.
A Temporal Saliency Boosting (TSB) module is designed to propagate the salient information among video frames to enhance the salient feature.
- Score: 110.43147302200101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a Temporal Complementary Learning Network that extracts
complementary features of consecutive video frames for video person
re-identification. Firstly, we introduce a Temporal Saliency Erasing (TSE)
module including a saliency erasing operation and a series of ordered learners.
Specifically, for a specific frame of a video, the saliency erasing operation
drives the specific learner to mine new and complementary parts by erasing the
parts activated by previous frames. Such that the diverse visual features can
be discovered for consecutive frames and finally form an integral
characteristic of the target identity. Furthermore, a Temporal Saliency
Boosting (TSB) module is designed to propagate the salient information among
video frames to enhance the salient feature. It is complementary to TSE by
effectively alleviating the information loss caused by the erasing operation of
TSE. Extensive experiments show our method performs favorably against
state-of-the-arts. The source code is available at
https://github.com/blue-blue272/VideoReID-TCLNet.
Related papers
- Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video [10.287675722826028]
This paper introduces Video Spatio-Temporal Disment Networks (VDST-Net) to disentangle information using semi-decoupled temporal knowledge distillation to predict high-quality class activation maps (CAMs)
We demonstrate the efficacy of our framework on a public reference dataset and on a more challenging surgical video dataset where objects are, on average, present in less than 60% of annotated frames.
arXiv Detail & Related papers (2024-07-22T16:52:32Z) - Spatio-temporal Prompting Network for Robust Video Feature Extraction [74.54597668310707]
Frametemporal is one of the main challenges in the field of video understanding.
Recent approaches exploit transformer-based integration modules to obtain quality-of-temporal information.
We present a neat and unified framework called N-Temporal Prompting Network (NNSTP)
It can efficiently extract video features by adjusting the input features in the network backbone.
arXiv Detail & Related papers (2024-02-04T17:52:04Z) - TF-CLIP: Learning Text-free CLIP for Video-based Person
Re-Identification [60.5843635938469]
We propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID.
More specifically, we extract the identity-specific sequence feature as the CLIP-Memory to replace the text feature.
Our proposed method shows much better results than other state-of-the-art methods on MARS, LS-VID and iLIDS-VID.
arXiv Detail & Related papers (2023-12-15T09:10:05Z) - Feature Disentanglement Learning with Switching and Aggregation for
Video-based Person Re-Identification [9.068045610800667]
In video person re-identification (Re-ID), the network must consistently extract features of the target person from successive frames.
Existing methods tend to focus only on how to use temporal information, which often leads to networks being fooled by similar appearances and same backgrounds.
We propose a Disentanglement and Switching and Aggregation Network (DSANet), which segregates the features representing identity and features based on camera characteristics, and pays more attention to ID information.
arXiv Detail & Related papers (2022-12-16T04:27:56Z) - Self-Supervised Video Representation Learning with Motion-Contrastive
Perception [13.860736711747284]
Motion-Contrastive Perception Network (MCPNet)
MCPNet consists of two branches, namely, Motion Information Perception (MIP) and Contrastive Instance Perception (CIP)
Our method outperforms current state-of-the-art visual-only self-supervised approaches.
arXiv Detail & Related papers (2022-04-10T05:34:46Z) - TCGL: Temporal Contrastive Graph for Self-supervised Video
Representation Learning [79.77010271213695]
We propose a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL)
Our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet Temporal Contrastive Graphs (TCG)
To generate supervisory signals for unlabeled videos, we introduce an Adaptive Snippet Order Prediction (ASOP) module.
arXiv Detail & Related papers (2021-12-07T09:27:56Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.