Video-based Person Re-identification without Bells and Whistles
- URL: http://arxiv.org/abs/2105.10678v1
- Date: Sat, 22 May 2021 10:17:38 GMT
- Title: Video-based Person Re-identification without Bells and Whistles
- Authors: Chih-Ting Liu, Jun-Cheng Chen, Chu-Song Chen, Shao-Yi Chien
- Abstract summary: Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras.
There exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods.
We present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets.
- Score: 49.51670583977911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based person re-identification (Re-ID) aims at matching the video
tracklets with cropped video frames for identifying the pedestrians under
different cameras. However, there exists severe spatial and temporal
misalignment for those cropped tracklets due to the imperfect detection and
tracking results generated with obsolete methods. To address this issue, we
present a simple re-Detect and Link (DL) module which can effectively reduce
those unexpected noise through applying the deep learning-based detection and
tracking on the cropped tracklets. Furthermore, we introduce an improved model
called Coarse-to-Fine Axial-Attention Network (CF-AAN). Based on the typical
Non-local Network, we replace the non-local module with three 1-D
position-sensitive axial attentions, in addition to our proposed coarse-to-fine
structure. With the developed CF-AAN, compared to the original non-local
operation, we can not only significantly reduce the computation cost but also
obtain the state-of-the-art performance (91.3% in rank-1 and 86.5% in mAP) on
the large-scale MARS dataset. Meanwhile, by simply adopting our DL module for
data alignment, to our surprise, several baseline models can achieve better or
comparable results with the current state-of-the-arts. Besides, we discover the
errors not only for the identity labels of tracklets but also for the
evaluation protocol for the test data of MARS. We hope that our work can help
the community for the further development of invariant representation without
the hassle of the spatial and temporal alignment and dataset noise. The code,
corrected labels, evaluation protocol, and the aligned data will be available
at https://github.com/jackie840129/CF-AAN.
Related papers
- An accurate detection is not all you need to combat label noise in web-noisy datasets [23.020126612431746]
We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples.
We propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach.
arXiv Detail & Related papers (2024-07-08T00:21:42Z) - Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification [40.83058938096914]
We propose the Self-Supervised Refined Clustering (SSR-C) framework to promote unsupervised video person re-identification.
Our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods.
arXiv Detail & Related papers (2024-06-20T12:30:12Z) - Camera Alignment and Weighted Contrastive Learning for Domain Adaptation
in Video Person ReID [17.90248359024435]
Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets.
The domain shift associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance.
This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID.
arXiv Detail & Related papers (2022-11-07T15:32:56Z) - A Free Lunch to Person Re-identification: Learning from Automatically
Generated Noisy Tracklets [52.30547023041587]
unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets.
But their performance is still far lower than the supervised counterparts.
In this paper, we propose to tackle this problem by learning re-ID models from automatically generated person tracklets.
arXiv Detail & Related papers (2022-04-02T16:18:13Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Learning to segment from misaligned and partial labels [0.0]
Many non-urban settings lack the ground-truth needed for accurate segmentation.
Open source infrastructure annotations like OpenStreetMaps (OSM) are representative of this issue.
We present a novel and generalizable two-stage framework that enables improved pixel-wise image segmentation given misaligned and missing annotations.
arXiv Detail & Related papers (2020-05-27T06:02:58Z) - Detection in Crowded Scenes: One Proposal, Multiple Predictions [79.28850977968833]
We propose a proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks.
Our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% $textMR-2$ improvements on CityPersons dataset.
arXiv Detail & Related papers (2020-03-20T09:48:53Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Solving Missing-Annotation Object Detection with Background
Recalibration Loss [49.42997894751021]
This paper focuses on a novel and challenging detection scenario: A majority of true objects/instances is unlabeled in the datasets.
Previous art has proposed to use soft sampling to re-weight the gradients of RoIs based on the overlaps with positive instances, while their method is mainly based on the two-stage detector.
In this paper, we introduce a superior solution called Background Recalibration Loss (BRL) that can automatically re-calibrate the loss signals according to the pre-defined IoU threshold and input image.
arXiv Detail & Related papers (2020-02-12T23:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.