Context Sensing Attention Network for Video-based Person
Re-identification
- URL: http://arxiv.org/abs/2207.02631v1
- Date: Wed, 6 Jul 2022 12:48:27 GMT
- Title: Context Sensing Attention Network for Video-based Person
Re-identification
- Authors: Kan Wang, Changxing Ding, Jianxin Pang, Xiangmin Xu
- Abstract summary: Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames.
Recent approaches handle this problem using temporal aggregation strategies.
We propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps.
- Score: 20.865710012336724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based person re-identification (ReID) is challenging due to the
presence of various interferences in video frames. Recent approaches handle
this problem using temporal aggregation strategies. In this work, we propose a
novel Context Sensing Attention Network (CSA-Net), which improves both the
frame feature extraction and temporal aggregation steps. First, we introduce
the Context Sensing Channel Attention (CSCA) module, which emphasizes responses
from informative channels for each frame. These informative channels are
identified with reference not only to each individual frame, but also to the
content of the entire sequence. Therefore, CSCA explores both the individuality
of each frame and the global context of the sequence. Second, we propose the
Contrastive Feature Aggregation (CFA) module, which predicts frame weights for
temporal aggregation. Here, the weight for each frame is determined in a
contrastive manner: i.e., not only by the quality of each individual frame, but
also by the average quality of the other frames in a sequence. Therefore, it
effectively promotes the contribution of relatively good frames. Extensive
experimental results on four datasets show that CSA-Net consistently achieves
state-of-the-art performance.
Related papers
- End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling [43.024232182899354]
We propose VidF4, a novel VideoQA framework equipped with tailored frame selection strategy for effective and efficient VideoQA.
We propose three frame-scoring mechanisms that consider both question relevance and inter-frame similarity to evaluate the importance of each frame for a given question on the video.
The experimental results across three widely adopted benchmarks demonstrate that our model consistently outperforms existing VideoQA methods.
arXiv Detail & Related papers (2024-07-21T04:09:37Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for
Video Summarization [61.69587867308656]
We propose a multimodal hierarchical shot-aware convolutional network, denoted as MHSCNet, to enhance the frame-wise representation.
Based on the learned shot-aware representations, MHSCNet can predict the frame-level importance score in the local and global view of the video.
arXiv Detail & Related papers (2022-04-18T14:53:33Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Local-Global Associative Frame Assemble in Video Re-ID [57.7470971197962]
Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause challenges in learning discriminative representations in video re-identification (Re-ID)
Most existing methods tackle this problem by assessing the importance of video frames according to either their local part alignments or global appearance correlations separately.
In this work, we explore jointly both local alignments and global correlations with further consideration of their mutual promotion/reinforcement.
arXiv Detail & Related papers (2021-10-22T19:07:39Z) - No frame left behind: Full Video Action Recognition [26.37329995193377]
We propose full video action recognition and consider all video frames.
We first cluster all frame activations along the temporal dimension.
We then temporally aggregate the frames in the clusters into a smaller number of representations.
arXiv Detail & Related papers (2021-03-29T07:44:28Z) - SF-Net: Single-Frame Supervision for Temporal Action Localization [60.202516362976645]
Single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead.
We propose a unified system called SF-Net to make use of such single-frame supervision.
SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization.
arXiv Detail & Related papers (2020-03-15T15:06:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.