BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video
Person Re-Identification
- URL: http://arxiv.org/abs/2104.14783v1
- Date: Fri, 30 Apr 2021 06:44:34 GMT
- Title: BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video
Person Re-Identification
- Authors: Ruibing Hou, Hong Chang, Bingpeng Ma, Rui Huang and Shiguang Shan
- Abstract summary: We present an efficient spatial-temporal representation for video person re-identification (reID)
We propose a Bilateral Complementary Network (BiCnet) for spatial complementarity modeling.
BiCnet-TKS outperforms state-of-the-arts with about 50% less computations.
- Score: 86.73532136686438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present an efficient spatial-temporal representation for
video person re-identification (reID). Firstly, we propose a Bilateral
Complementary Network (BiCnet) for spatial complementarity modeling.
Specifically, BiCnet contains two branches. Detail Branch processes frames at
original resolution to preserve the detailed visual clues, and Context Branch
with a down-sampling strategy is employed to capture long-range contexts. On
each branch, BiCnet appends multiple parallel and diverse attention modules to
discover divergent body parts for consecutive frames, so as to obtain an
integral characteristic of target identity. Furthermore, a Temporal Kernel
Selection (TKS) block is designed to capture short-term as well as long-term
temporal relations by an adaptive mode. TKS can be inserted into BiCnet at any
depth to construct BiCnetTKS for spatial-temporal modeling. Experimental
results on multiple benchmarks show that BiCnet-TKS outperforms
state-of-the-arts with about 50% less computations. The source code is
available at https://github.com/ blue-blue272/BiCnet-TKS.
Related papers
- Bilateral Network with Residual U-blocks and Dual-Guided Attention for
Real-time Semantic Segmentation [18.393208069320362]
We design a new fusion mechanism for two-branch architecture which is guided by attention computation.
To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations.
Experiments on Cityscapes and CamVid dataset show the effectiveness of our method.
arXiv Detail & Related papers (2023-10-31T09:20:59Z) - FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals
in Factorized Orthogonal Latent Space [7.324708513042455]
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals.
It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin.
arXiv Detail & Related papers (2023-10-30T22:55:29Z) - C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action
Segmentation [20.182928938110923]
Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence.
We propose an encoder-decoder-style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs.
We show that the architecture is flexible for both supervised and representation learning.
arXiv Detail & Related papers (2022-12-20T14:53:46Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign
Language Recognition [4.059599144668737]
Continuous sign language recognition is a public significant task that transcribes a sign language video into an ordered gloss sequence.
One promising way is to adopt a one-dimensional convolutional network (1D-CNN) to temporally fuse the sequential frames.
We propose to adaptively fuse local features via temporal similarity for this task.
arXiv Detail & Related papers (2021-07-27T12:06:56Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - Batch Coherence-Driven Network for Part-aware Person Re-Identification [79.33809815035127]
Existing part-aware person re-identification methods typically employ two separate steps: namely, body part detection and part-level feature extraction.
We propose NetworkBCDNet that bypasses body part during both the training and testing phases while still semantically aligned features.
arXiv Detail & Related papers (2020-09-21T09:04:13Z) - ASAP-Net: Attention and Structure Aware Point Cloud Sequence
Segmentation [49.15948235059343]
We further improve point-temporal cloud feature with a flexible module called ASAP.
Our ASAP module contains an attentive temporal embedding layer to fuse the relatively informative local features across frames in a recurrent fashion.
We show the generalization ability of the proposed ASAP module with different computation backbone networks for point cloud sequence segmentation.
arXiv Detail & Related papers (2020-08-12T07:37:16Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.