A Flow-Guided Mutual Attention Network for Video-Based Person
Re-Identification
- URL: http://arxiv.org/abs/2008.03788v2
- Date: Sun, 4 Oct 2020 23:56:23 GMT
- Title: A Flow-Guided Mutual Attention Network for Video-Based Person
Re-Identification
- Authors: Madhu Kiran, Amran Bhuiyan, Louis-Antoine Blais-Morin, Mehrsan Javan,
Ismail Ben Ayed, Eric Granger
- Abstract summary: Person ReID is a challenging problem in many analytics and surveillance applications.
Video-based person ReID has recently gained much interest because it allows capturing feature discriminant-temporal information.
In this paper, the motion pattern of a person is explored as an additional cue for ReID.
- Score: 25.217641512619178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person Re-Identification (ReID) is a challenging problem in many video
analytics and surveillance applications, where a person's identity must be
associated across a distributed non-overlapping network of cameras. Video-based
person ReID has recently gained much interest because it allows capturing
discriminant spatio-temporal information from video clips that is unavailable
for image-based ReID. Despite recent advances, deep learning (DL) models for
video ReID often fail to leverage this information to improve the robustness of
feature representations. In this paper, the motion pattern of a person is
explored as an additional cue for ReID. In particular, a flow-guided Mutual
Attention network is proposed for fusion of image and optical flow sequences
using any 2D-CNN backbone, allowing to encode temporal information along with
spatial appearance information. Our Mutual Attention network relies on the
joint spatial attention between image and optical flow features maps to
activate a common set of salient features across them. In addition to
flow-guided attention, we introduce a method to aggregate features from longer
input streams for better video sequence-level representation. Our extensive
experiments on three challenging video ReID datasets indicate that using the
proposed Mutual Attention network allows to improve recognition accuracy
considerably with respect to conventional gated-attention networks, and
state-of-the-art methods for video-based person ReID.
Related papers
- Pyramid Attention Network for Medical Image Registration [4.142556531859984]
We propose a pyramid attention network (PAN) for deformable medical image registration.
PAN incorporates a dual-stream pyramid encoder with channel-wise attention to boost the feature representation.
Our method achieves favorable registration performance, while outperforming several CNN-based and Transformer-based registration networks.
arXiv Detail & Related papers (2024-02-14T08:46:18Z) - Feature Disentanglement Learning with Switching and Aggregation for
Video-based Person Re-Identification [9.068045610800667]
In video person re-identification (Re-ID), the network must consistently extract features of the target person from successive frames.
Existing methods tend to focus only on how to use temporal information, which often leads to networks being fooled by similar appearances and same backgrounds.
We propose a Disentanglement and Switching and Aggregation Network (DSANet), which segregates the features representing identity and features based on camera characteristics, and pays more attention to ID information.
arXiv Detail & Related papers (2022-12-16T04:27:56Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification [82.6971648465279]
We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-07T12:27:42Z) - Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video
Re-identification [14.785070524184649]
Video-based Re-ID is an expansion of earlier image-based re-identification methods.
We show superior performance from a simple single stream 2D convolution network leveraging the ResNet50-IBN architecture.
Our approach uses best video Re-ID practice and transfer learning between datasets to outperform existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-14T12:19:32Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.