Attribute-aware Identity-hard Triplet Loss for Video-based Person
Re-identification
- URL: http://arxiv.org/abs/2006.07597v1
- Date: Sat, 13 Jun 2020 09:15:38 GMT
- Title: Attribute-aware Identity-hard Triplet Loss for Video-based Person
Re-identification
- Authors: Zhiyuan Chen, Annan Li, Shilu Jiang, Yunhong Wang
- Abstract summary: Video-based person re-identification (Re-ID) is an important computer vision task.
We introduce a new metric learning method called Attribute-aware Identity-hard Triplet Loss (AITL)
To achieve a complete model of video-based person Re-ID, a multi-task framework with Attribute-driven Spatio-Temporal Attention (ASTA) mechanism is also proposed.
- Score: 51.110453988705395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based person re-identification (Re-ID) is an important computer vision
task. The batch-hard triplet loss frequently used in video-based person Re-ID
suffers from the Distance Variance among Different Positives (DVDP) problem. In
this paper, we address this issue by introducing a new metric learning method
called Attribute-aware Identity-hard Triplet Loss (AITL), which reduces the
intra-class variation among positive samples via calculating attribute
distance. To achieve a complete model of video-based person Re-ID, a multi-task
framework with Attribute-driven Spatio-Temporal Attention (ASTA) mechanism is
also proposed. Extensive experiments on MARS and DukeMTMC-VID datasets shows
that both the AITL and ASTA are very effective. Enhanced by them, even a simple
light-weighted video-based person Re-ID baseline can outperform existing
state-of-the-art approaches. The codes has been published on
https://github.com/yuange250/Video-based-person-ReID-with-Attribute-information.
Related papers
- Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person
Re-identification [78.08536797239893]
We propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules.
MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips.
We show that MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
arXiv Detail & Related papers (2023-01-02T05:17:31Z) - A High-Accuracy Unsupervised Person Re-identification Method Using
Auxiliary Information Mined from Datasets [53.047542904329866]
We make use of auxiliary information mined from datasets for multi-modal feature learning.
This paper proposes three effective training tricks, including Restricted Label Smoothing Cross Entropy Loss (RLSCE), Weight Adaptive Triplet Loss (WATL) and Dynamic Training Iterations (DTI)
arXiv Detail & Related papers (2022-05-06T10:16:18Z) - Video Person Re-identification using Attribute-enhanced Features [49.68392018281875]
We propose a novel network architecture named Attribute Salience Assisted Network (ASA-Net) for attribute-assisted video person Re-ID.
To learn a better separation of the target from background, we propose to learn the visual attention from middle-level attribute instead of high-level identities.
arXiv Detail & Related papers (2021-08-16T07:41:27Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - Not 3D Re-ID: a Simple Single Stream 2D Convolution for Robust Video
Re-identification [14.785070524184649]
Video-based Re-ID is an expansion of earlier image-based re-identification methods.
We show superior performance from a simple single stream 2D convolution network leveraging the ResNet50-IBN architecture.
Our approach uses best video Re-ID practice and transfer learning between datasets to outperform existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-14T12:19:32Z) - ESA-ReID: Entropy-Based Semantic Feature Alignment for Person re-ID [7.978877859859102]
Person re-identification (re-ID) is a challenging task in real-world. Besides the typical application in surveillance system, re-ID also has significant values to improve the recall rate of people identification in content video (TV or Movies)
In this paper we propose an entropy based semantic feature alignment model, which takes advantages of the detailed information of the human semantic feature.
Considering the uncertainty of semantic segmentation, we introduce a semantic alignment with an entropy-based mask which can reduce the negative effects of mask segmentation errors.
arXiv Detail & Related papers (2020-07-09T08:56:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.