Keypoint Message Passing for Video-based Person Re-Identification
- URL: http://arxiv.org/abs/2111.08279v1
- Date: Tue, 16 Nov 2021 08:01:16 GMT
- Title: Keypoint Message Passing for Video-based Person Re-Identification
- Authors: Di Chen, Andreas Doering, Shanshan Zhang, Jian Yang, Juergen Gall,
Bernt Schiele
- Abstract summary: Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
- Score: 106.41022426556776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based person re-identification (re-ID) is an important technique in
visual surveillance systems which aims to match video snippets of people
captured by different cameras. Existing methods are mostly based on
convolutional neural networks (CNNs), whose building blocks either process
local neighbor pixels at a time, or, when 3D convolutions are used to model
temporal information, suffer from the misalignment problem caused by person
movement. In this paper, we propose to overcome the limitations of normal
convolutions with a human-oriented graph method. Specifically, features located
at person joint keypoints are extracted and connected as a spatial-temporal
graph. These keypoint features are then updated by message passing from their
connected nodes with a graph convolutional network (GCN). During training, the
GCN can be attached to any CNN-based person re-ID model to assist
representation learning on feature maps, whilst it can be dropped after
training for better inference speed. Our method brings significant improvements
over the CNN-based baseline model on the MARS dataset with generated person
keypoints and a newly annotated dataset: PoseTrackReID. It also defines a new
state-of-the-art method in terms of top-1 accuracy and mean average precision
in comparison to prior works.
Related papers
- Networked Time Series Imputation via Position-aware Graph Enhanced
Variational Autoencoders [31.953958053709805]
We design a new model named PoGeVon which leverages variational autoencoder (VAE) to predict missing values over both node time series features and graph structures.
Experiment results demonstrate the effectiveness of our model over baselines.
arXiv Detail & Related papers (2023-05-29T21:11:34Z) - Pose-Aided Video-based Person Re-Identification via Recurrent Graph
Convolutional Network [41.861537712563816]
We propose to learn the discriminative pose feature beyond the appearance feature for video retrieval.
To learn the pose feature, we first detect the pedestrian pose in each frame through an off-the-shelf pose detector.
We then exploit a recurrent graph convolutional network (RGCN) to learn the node embeddings of the temporal pose graph.
arXiv Detail & Related papers (2022-09-23T13:20:33Z) - Gate-Shift-Fuse for Video Action Recognition [43.8525418821458]
Gate-Fuse (GSF) is a novel-temporal feature extraction module which controls interactions in-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner.
GSF can be inserted into existing 2D CNNs to convert them into efficient and high performing, with negligible parameter and compute overhead.
We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.
arXiv Detail & Related papers (2022-03-16T19:19:04Z) - A Novel Hand Gesture Detection and Recognition system based on
ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities.
Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks.
In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z) - Locally Aware Piecewise Transformation Fields for 3D Human Mesh
Registration [67.69257782645789]
We propose piecewise transformation fields that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space.
We show that fitting parametric models with poses by our network results in much better registration quality, especially for extreme poses.
arXiv Detail & Related papers (2021-04-16T15:16:09Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.