Dense Interaction Learning for Video-based Person Re-identification
- URL: http://arxiv.org/abs/2103.09013v2
- Date: Thu, 18 Mar 2021 07:03:30 GMT
- Title: Dense Interaction Learning for Video-based Person Re-identification
- Authors: Tianyu He, Xin Jin, Xu Shen, Jianqiang Huang, Zhibo Chen, Xian-Sheng
Hua
- Abstract summary: We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
- Score: 75.03200492219003
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video-based person re-identification (re-ID) aims at matching the same person
across video clips. Efficiently exploiting multi-scale fine-grained features
while building the structural interaction among them is pivotal for its
success. In this paper, we propose a hybrid framework, Dense Interaction
Learning (DenseIL), that takes the principal advantages of both CNN-based and
Attention-based architectures to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder. The CNN
encoder is responsible for efficiently extracting discriminative spatial
features while the DI decoder is designed to densely model spatial-temporal
inherent interaction across frames. Different from previous works, we
additionally let the DI decoder densely attends to intermediate fine-grained
CNN features and that naturally yields multi-grained spatial-temporal
representation for each video clip. Moreover, we introduce Spatio-TEmporal
Positional Embedding (STEP-Emb) into the DI decoder to investigate the
positional relation among the spatial-temporal inputs. Our experiments
consistently and significantly outperform all the state-of-the-art methods on
multiple standard video-based re-ID datasets.
Related papers
- VrdONE: One-stage Video Visual Relation Detection [30.983521962897477]
Video Visual Relation Detection (VidVRD) focuses on understanding how entities over time and space in videos.
Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation are present and another for determining their temporal boundaries.
We propose VrdONE, a streamlined yet efficacious one-stage model for VidVRD.
arXiv Detail & Related papers (2024-08-18T08:38:20Z) - CSTA: CNN-based Spatiotemporal Attention for Video Summarization [0.24578723416255752]
We propose a CNN-based SpatioTemporal Attention (CSTA) method that stacks each feature of frames from a single video to form image-like frame representations.
Our methodology relies on CNN to comprehend the inter and intra-frame relations and to find crucial attributes in videos by exploiting its ability to learn absolute positions within images.
arXiv Detail & Related papers (2024-05-20T09:38:37Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.